

It is, so far, the largest publicly and freely available multimedia collection of metadata representing about 99.2 million photos and 0.8 million videos, all of which were uploaded to Flickr between 20. The YFCC100M dataset was created to meet these needs and overcome many of the issues affecting existing multimedia datasets. In recent years there was a growing demand for a dataset that was not specifically biased or targeted towards certain topics, sufficiently large, truly multimodal, and freely usable without licensing issues. YFCC100M: The Largest Multimodal Public Multimedia Datasetĭatasets are unarguably one of the most important components of multimedia research. In particular, we provide a practical guide on how to set up a feasible and cost effective research and development environment locally or in the cloud that can access the data without having to download it first. In this article, we introduce useful information and tools to boost the usability and accessibility of the YFCC100M, including the supplemental material provided by the Multimedia Commons (MMCOMMONS) community. However, its sheer volume, one of the traits that make the dataset unique and valuable, can pose a barrier to those who do not have access to powerful computing resources. The Yahoo-Flickr Creative Commons 100 Million (YFCC100M), the largest freely usable multimedia dataset to have been released so far, is widely used by students, researchers and engineers on topics in multimedia that range from computer vision to machine learning. Affiliation: International Computer Science Institute / TU Delft
