Labelf.ai Blog
Blog homeLabelf.ai

9 Great Resources for Finding Data

January 11, 2021
Data

Where can I find datasets to classify?

1. Huggingface datasets

Huggingface datasets consists as of writing over 600 datasets in 80+ languages and they can all be browsed by tags in their viewer.

https://huggingface.co/datasets/viewer/

2. Data.world

With a huge amount of data alot can be found here.

https://data.world/

3. Kaggle

Kaggle hosts over 60.000 datasets.

https://www.kaggle.com/datasets

4. The Pile

The pile consists of 840GB of text data from a great variety of domains in English.

https://pile.eleuther.ai/

5. Pushift

API for querying reddit and other social media data.

https://pushshift.io/

6. Stack Overflow

Search and fetch stackoverflow posts.

https://data.stackexchange.com/stackoverflow/query/new

7. Data.gov

Find data from the US government

https://catalog.data.gov/dataset

8. Google dataset search

Search for datasets with Google.

https://datasetsearch.research.google.com/

9. OPUS

A great source for multilingual content ranging from subtitles to law.

http://opus.nlpl.eu/

Viktor Alm

I'm Viktor

More posts from this author

Data

Explore more posts

Apply to our private beta

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.