Where can I find datasets to classify?
1. Huggingface datasets
Huggingface datasets consists as of writing over 600 datasets in 80+ languages and they can all be browsed by tags in their viewer.
With a huge amount of data alot can be found here.
Kaggle hosts over 60.000 datasets.
4. The Pile
The pile consists of 840GB of text data from a great variety of domains in English.
API for querying reddit and other social media data.
6. Stack Overflow
Search and fetch stackoverflow posts.
Find data from the US government
8. Google dataset search
Search for datasets with Google.
A great source for multilingual content ranging from subtitles to law.