A data trusts repository to help people discover and make it easily accessible for everyone to uses or to get paid for contribute higher quality, trustworthy data.

Dataset Type

Humaan Datasets

Common Voice

by Mozilla

An open source, multi-language dataset of voices that anyone can use to train speech-enabled applications.audio2017
Humaan Datasets

YouTube 8M

by Google, LCC

A large-scale labeled video dataset that consists of millions of YouTube video IDs, with high-quality machine-generated annotations from a diverse vocabulary of 3,800+ visual entities. It comes with precomputed audio-visual features from billions of frames and audio segments, designed to fit on a single hard disk.video2017
Humaan Datasets

Open Images

by Google, LLC

Open Images is a dataset of ~9 million images that have been annotated with image-level labels and bounding boxes spanning thousands of classes.image2020
Humaan Datasets


by Zalando

A dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples.image2017
Humaan Datasets


by Yann LeCun

The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples.image1998
Humaan Datasets


by COCO Consortium

A large-scale object detection, segmentation, and captioning dataset.image2020
Humaan Datasets

Project CodeNet

by IBM Research

A large-scale dataset with approximately 14 million code samples. The code samples are written in over 50 programming languages.text2021
Humaan Datasets

The Oxford-IIIT Pet Dataset

by Oxford University

A 37 category pet dataset with roughly 200 images for each class. The images have a large variations in scale, pose and lighting. All images have an associated ground truth annotation of breed, head ROI, and pixel level trimap segmentation.image2012
Humaan Datasets

Visual Genome

by Stanford University

A knowledge base, an ongoing effort to connect structured image concepts to language.image2015