Toloka Visual Question Answering Dataset

Our dataset consists of the images associated with textual questions. One entry (instance) in our dataset is a question-image pair labeled with the ground truth coordinates of a bounding box containing the visual answer to the given question. The images were obtained from a CC BY-licensed subset of the Microsoft Common Objects in Context dataset, MS COCO. All data labeling was performed on the Toloka crowdsourcing platform, https://toloka.ai/. Our dataset has 45,199 instances split among three subsets: train (38,990 instances), public test (1,705 instances), and private test (4,504 instances). The entire train dataset will be available for everyone since the start of the challenge. The public test dataset will be available since the evaluation phase of the competition, but without any ground truth labels. The private test dataset will not be available until the challenge ends. The datasets will be provided as files in the comma-separated values (CSV) format containing the following columns. Column Type Description image string URL of an image on a public content delivery network width integer image width height integer image height left integer bounding box coordinate: left top integer bounding box coordinate: top right integer bounding box coordinate: right bottom integer bounding box coordinate: bottom question string question in English This upload also contains a ZIP file with the images from MS COCO.

Keywords

Toloka, bounding box, question answering, English, commonsense reasoning, competition, computer vision, WSDM Cup

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average