PathoNet

PathoNet is a general purpose dataset for digital pathology. It consists of 4,462,156 jpg images divided into 12 classes (tissues). These images were extracted from TCGA (The Cancer Genome Atlas) data portal. No annotations were made, only the tissue type was taken from the slides metadata. For each tissue, 400,000 256x256 pixel images were randomly selected and downloaded from 400 WSI. An automated cleaning process was then performed to eliminate cases with excessive white content and blurred images. The dataset is already divided into Train, Test and Validation. When dividing the data, cases were taken into account to avoid mixing images from the same case in different partitions, i.e. all images corresponding to a particular case are in the same partition. The final number of images for each class and partition are: Tissue Partition # of images Bladder Train 308.677 Bladder Validation 38.927 Bladder Test 39.166 Brain Train 313.890 Brain Validation 39.665 Brain Test 39.613 Breast Train 303.949 Breast Validation 37.499 Breast Test 38.602 Bronchus and lung Train 308.848 Bronchus and lung Validation 37.730 Bronchus and lung Test 39.160 Colon Train 243.330 Colon Validation 30.220 Colon Test 32.135 Corpus uteri Train 312.743 Corpus uteri Validation 39.549 Corpus uteri Test 39.184 Kidney Train 311.005 Kidney Validation 37.950 Kidney Test 39.184 Liver and intrahepatic bile ducts Train 314.707 Liver and intrahepatic bile ducts Validation 38.689 Liver and intrahepatic bile ducts Test 39.799 Prostate gland Train 296.181 Prostate gland Validation 36.568 Prostate gland Test 36.376 Skin Train 307.308 Skin Validation 37.411 Skin Test 38.487 Stomach Train 295.002 Stomach Validation 37.559 Stomach Test 36.112 Thyroid gland Train 258.415 Thyroid gland Validation 33.667 Thyroid gland Test 33.849 For convenience, the training data has been uploaded by class.

Keywords

wsi, cancer, deep learning, digital pathology

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Beta

SDGs Suggest

3. Good health

Beta

SDGs:

3. Good health,

Related to Research communities

Knowmad Institut

Cancer Research