Open Datasets

List of Open Datasets

We have collect and indexed a list of open datasets for machine learning and data science.

DatasetDescriptionDataset typeDataset sizeLicense typePublication year

Soccer Video and Player Position

By Simula Research Laboratory

A dataset of elite soccer player movements and corresponding videos that captured at Alfheim Stadium—the home arena for Tromsø IL (Norway).Video93.14GiBCC BY2014

Visual Genome

By Stanford University

A dataset, a knowledge base, an ongoing effort to connect structured image concepts to language.Image-CC BY2017

Oxford RobotCar

By University of Oxford

The Oxford RobotCar Dataset contains over 100 repetitions of a consistent route through Oxford, UK, captured over a period of over a year.Video-CC BY2016

Berkeley DeepDrive

By University of California, Berkeley

100,000 HD video sequences of over 1,100-hour driving experience across many different times in the day, weather conditions, and driving scenarios.Video-CC BY2018

Common Voice

By Mozilla Foundation

Common Voice is an open-source multi-language speech dataset that is partly created by online volunteer contributors. Audio46.6GiBCC-0-

Caltech Pedestrian

By California Institute of Technology

The Caltech Pedestrian Dataset consists of approximately 10 hours of 640x480 30Hz video taken from a vehicle driving through regular traffic in an urban environment.Video9.64GiBCC BY2009

Large Movie Review Dataset

By Stanford University

A dataset for binary sentiment classification containing substantially more data than previous benchmark datasets.Language76.5MiBCC BY2011

Open Images

By Google, LLC

Open Images is a dataset of ~9 million images that have been annotated with image-level labels and bounding boxes spanning thousands of classes.Image-CC BY2020

Stanford Dogs Dataset

By Stanford University

The Stanford Dogs dataset contains images of 120 breeds of dogs from around the world.Image721.9MiBCC BY2009