DECIMER Image classifier dataset

{"references": ["Brinkhaus, H.O., Rajan, K., Zielesny, A. et al. RanDepict: Random chemical structure depiction generator. J Cheminform 14, 31 (2022). https://doi.org/10.1186/s13321-022-00609-4", "Gaulton A, Hersey A, Nowotka M, Bento AP, Chambers J, Mendez D, Mutowo P, Atkinson F, Bellis LJ, Cibri\u00e1n-Uhalte E, Davies M, Dedman N, Karlsson A, Magari\u00f1os MP, Overington JP, Papadatos G, Smit I, Leach AR. (2017) 'The ChEMBL database in 2017.' Nucleic Acids Res., 45(D1) D945-D954", "Lin, Tsung-Yi et al. (2014). Microsoft COCO: Common Objects in Context. https://arxiv.org/abs/1405.0312", "B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. Learning Deep Features for Scene Recognition using Places Database. Advances in Neural Information Processing Systems 27 (NIPS), 2014.", "Krishna, Ranhay et al. Visual Genome. Connecting Language and Vision Using Crowdsourced Dense Image Annotations. http://visualgenome.org/static/paper/Visual_Genome.pdf", "https://storage.googleapis.com/openimages/web/index.html", "T. Nasir, M. K. Malik and K. Shahzad, \"MMU-OCR-21: Towards End-to-End Urdu Text Recognition Using Deep Learning,\" in IEEE Access, doi: 10.1109/ACCESS.2021.3110787", "https://www.kaggle.com/datasets/vaibhao/handwritten-characters", "https://www.kaggle.com/datasets/praveengovi/coronahack-chest-xraydataset", "https://www.kaggle.com/datasets/amyjang/pandatilesagg?select=all_images", "https://www.kaggle.com/datasets/nilay1987/bacterial-colony", "https://www.kaggle.com/datasets/pabasar/ceylon-epigraphy-periods", "https://www.kaggle.com/datasets/yuanhaowang486/chinese-calligraphy-styles-by-calligraphers", "https://www.kaggle.com/datasets/sunedition/graphs-dataset", "https://www.kaggle.com/datasets/kopfgeldjaeger/function-graphs-polynomial", "https://www.kaggle.com/datasets/vishnunkumar/sketches", "https://www.kaggle.com/datasets/almightyj/person-face-sketches", "https://www.kaggle.com/datasets/olgabelitskaya/art-pictogram", "https://www.kaggle.com/datasets/tatianasnwrt/russian-handwritten-letters", "https://www.kaggle.com/datasets/olgabelitskaya/handwritten-russian-letters", "https://www.kaggle.com/datasets/arashnic/misinfo-graph", "https://www.kaggle.com/datasets/roycezjq/graphemeimgs224x224"]}

Images dataset divided into train (10905114 images), validation (2115528 images) and test (544946 images) folders containing a balanced number of images for two classes (chemical structures and non-chemical structures). The chemical structures were generated using RanDepict to random picked compounds from the ChEMBL30 database and the COCONUT database. The non-chemical structures were generated using Python or they were retrieved from several public datasets: COCO dataset, MIT Places-205 dataset, Visual Genome dataset, Google Open labeled Images, MMU-OCR-21 (kaggle), HandWritten_Character (kaggle), CoronaHack -Chest X-Ray-dataset (kaggle), PANDAS Augmented Images (kaggle), Bacterial_Colony (kaggle), Ceylon Epigraphy Periods (kaggle), Chinese Calligraphy Styles by Calligraphers (kaggle), Graphs Dataset (kaggle), Function_Graphs Polynomial (kaggle), sketches (kaggle), Person Face Sketches (kaggle), Art Pictograms (kaggle), Russian handwritten letters (kaggle), Handwritten Russian Letters (kaggle), Covid-19 Misinformation Tweets Labeled Dataset (kaggle) and grapheme-imgs-224x224 (kaggle). This data was used to build a CNN classification model using as a base model EfficienNetB0 and fine tuning it. The model is available on Github.

Related Organizations

University of Chemistry and Technology
Czech Republic

Keywords

chemical structures, classification model

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Usage byUsageCounts

visibility	views	73
download	downloads	17

73
views
17
downloads
Powered by

Found an issue? Give us feedback

visibility

download

1

Average

73

17