Labelf Blog

Discover the latest product updates, announcements, and articles from the Labelf team.
January 11, 2021
Data

9 Great Resources for Finding Data

Where can I find datasets to classify?

1. Huggingface datasets

Huggingface datasets consists as of writing over 600 datasets in 80+ languages and they can all be browsed by tags in their viewer.

https://huggingface.co/datasets/viewer/

2. Data.world

With a huge amount of data alot can be found here.

https://data.world/

3. Kaggle

Kaggle hosts over 60.000 datasets.

https://www.kaggle.com/datasets

4. The Pile

The pile consists of 840GB of text data from a great variety of domains in English.

https://pile.eleuther.ai/

5. Pushift

API for querying reddit and other social media data.

https://pushshift.io/

6. Stack Overflow

Search and fetch stackoverflow posts.

https://data.stackexchange.com/stackoverflow/query/new

7. Data.gov

Find data from the US government

https://catalog.data.gov/dataset

8. Google dataset search

Search for datasets with Google.

https://datasetsearch.research.google.com/

9. OPUS

A great source for multilingual content ranging from subtitles to law.

http://opus.nlpl.eu/

Viktor Alm

CEO @ Labelf AI

Unleash the power of AI

Leave your email to request access to the platform and stay in the loop.
Thank you! Your submission has been received!
Go to our beta-application
Oops! Something went wrong while submitting the form.