Datasets
A data trusts repository to help people discover and make it easily accessible for everyone to uses or to get paid for contribute higher quality, trustworthy data.
Dataset Type
Image
Video
Audio
Language
Geographic
Dataset Type
Image
Video
Audio
Language
Geographic
Dataset | Description | Type | Year |
---|---|---|---|
Common Voice by Mozilla | An open source, multi-language dataset of voices that anyone can use to train speech-enabled applications. | audio | 2017 |
YouTube 8M by Google, LCC | A large-scale labeled video dataset that consists of millions of YouTube video IDs, with high-quality machine-generated annotations from a diverse vocabulary of 3,800+ visual entities. It comes with precomputed audio-visual features from billions of frames and audio segments, designed to fit on a single hard disk. | video | 2017 |
Open Images by Google, LLC | Open Images is a dataset of ~9 million images that have been annotated with image-level labels and bounding boxes spanning thousands of classes. | image | 2020 |
Fashion-MNIST by Zalando | A dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. | image | 2017 |
MNIST by Yann LeCun | The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. | image | 1998 |
MS-COCO by COCO Consortium | A large-scale object detection, segmentation, and captioning dataset. | image | 2020 |
Project CodeNet by IBM Research | A large-scale dataset with approximately 14 million code samples. The code samples are written in over 50 programming languages. | text | 2021 |
The Oxford-IIIT Pet Dataset by Oxford University | A 37 category pet dataset with roughly 200 images for each class. The images have a large variations in scale, pose and lighting. All images have an associated ground truth annotation of breed, head ROI, and pixel level trimap segmentation. | image | 2012 |
Visual Genome by Stanford University | A knowledge base, an ongoing effort to connect structured image concepts to language. | image | 2015 |