A data trusts repository to help people discover and make it easily accessible for everyone to uses or to get paid for contribute higher quality, trustworthy data.


Common Voice

by Mozilla

An open source, multi-language dataset of voices that anyone can use to train speech-enabled applications.Audio2017

Open Images

by Google, LLC

Open Images is a dataset of ~9 million images that have been annotated with image-level labels and bounding boxes spanning thousands of classes.Image2020


by MIT Computer Science and Artificial Intelligence Laboratory

Places is scene recognition with contains more than 2.5 million images covering more than 205 scene categories with more than 5,000 images per category.Image2017


by Defense Innovation Unit Experimental (DIUx)

One of the largest publicly available datasets of overhead imagery. It contains images from complex scenes around the world, annotated using bounding boxes.Image2018


by Google, LLC

A collection of large-scale, high-quality datasets of URL links of up to 650,000 video clips that cover 400/600/700 human action classes, depending on the dataset version.Image2017

Amazon reviews

by Jure Leskovec

Amazon reviews consists of reviews from amazon. The data span a period of 18 years, including ~35 million reviews up to March 2013. Reviews include product and user information, ratings, and a plaintext review.Language2013