<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>

COPY SCRIPT

For further information contact us at helpdesk@openaire.eu

PAP-QMNIST dataset

Name: PAP-QMNIST dataset
Keywords: 3. Good health

Research datakeyboard_double_arrow_right Dataset 26 Aug 2022Publisher:Zenodo

Authors: Nadezhda Koriakina; Joakim Lindblad; Natasa Sladoje;

doi: 10.5281/zenodo.7020311 , 10.5281/zenodo.7020312 , 10.5281/zenodo.10038335

PAP-QMNIST dataset

- Summary
- Metrics

Abstract

PAP-QMNIST - a synthetic dataset that mimics several properties of existing oral cancer (OC) dataset (described in the study [1]), such as cell image size, color distribution, arbitrary rotation of cells, amount of blur and noise, number of patients, and number of images per each patient.A main advantage of PAP-QMNIST is that it offers access to reliable ground truth annotation at the instance (cell) level incombination with being visually interpretable for non-experts. We base PAP-QMNIST on the QMNIST dataset [2], for which the object (digit) is located in the central part of the image, similarly to (the detected and cut-out) nuclei in our OC data. We rescale original QMNIST images to the size of OC images using bilinear interpolation, we add color and augment this dataset by including images transformed by transformations expected in OC data to replicate the number of patients and number of images per patient in the OC dataset. The details are in [1], and the code to create such PAP-QMNIST data is in Create_PAP_QMNISTbags_datasets2.ipynb. The uploaded PAP-QMNIST datasets (PAP5perc_key_inst.zip, PAP10perc_key_inst.zip, PAP20perc_key_inst.zip, PAP30perc_key_inst.zip are corresponding versions of PAP-QMNIST with 5, 10, 20 and 30% of key instances) are generated and analyzed during the study [1]. Names for images of key instances (images of digit '4') are starting with '4'. To introduce variations that could be observed in real data, we design experiments with PAP-QMNIST where the percentage of key instances varies in positive bags and is sampled according to a beta distribution (commonly considered a suitable model for the random behavior of percentages and proportions) with the mean of 17.5% (the middle of the range [5-30%]) and standard deviations of 5% and 10%. These datasets are referred to as PAP_beta_mean17.5_std5.zip and PAP_beta_mean17.5_std10.zip, respectively. The code to create such PAP-QMNIST data is in Create_PAP_QMNISTbags_datasets_with_beta_distributed.ipynb. [1] Koriakina, N., Sladoje, N., Bašić, V., & Lindblad, J. (2022). Oral cancer detection and interpretation: Deep multiple instance learning versus conventional deep single instance learning. arXiv preprint arXiv:2202.01783. [2] Yadav, C., & Bottou, L. (2019). Cold case: The lost mnist digits. Advances in neural information processing systems, 32.

Related Organizations

Uppsala University
Sweden

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average