<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>
TCGA-UT Dataset Documentation Quick Links Dataset on Hugging Face: For users interested in benchmarking foundation models or feature extractors, please visit TCGA-UT on Hugging Face Original Paper: Universal encoding of pan-cancer histology by deep texture representations Dataset Overview The TCGA-UT dataset is a large-scale collection of histopathological image patches from human cancer tissues. It contains 1,608,060 image patches extracted from hematoxylin & eosin (H&E) stained histological samples across 32 different types of solid cancers. Key Features Size: Over 1.6 million image patches Resolution: All patches are standardized to 256 x 256 pixels Source: Derived from The Cancer Genome Atlas (TCGA) dataset Quality: Curated by trained pathologists Coverage: 32 different cancer types Patient Base: 7,175 patients from 8,736 diagnostic slides Data Collection Process Image Source: Whole Slide Images (WSI) were downloaded from the GDC legacy database between December 2016 and June 2017 Expert Annotation: Two trained pathologists selected at least three representative tumor regions per slide Quality Control: 926 slides were removed due to various quality issues (poor staining, low resolution, focus problems, etc.) Patch Extraction: 10 patches were randomly cropped at 6 different magnification levels from each annotated region File Structure Files are organized using the following format: Copy [cancer_type]/[resolution]/[TCGA Barcode]/[region]-[number]-[pixel resolution].jpg Resolution Key 0: 0.5 μm/pixel 1: 0.6 μm/pixel 2: 0.7 μm/pixel 3: 0.8 μm/pixel 4: 0.9 μm/pixel 5: 1.0 μm/pixel License Non-Commercial Use: CC-BY-NC-SA 4.0 Commercial Use: Please contact ishum-prm@m.u-tokyo.ac.jp for licensing Citation If you use this dataset in your research, please cite: Copy Komura, D., et al. (2022). Universal encoding of pan-cancer histology by deep texture representations. Cell Reports 38, 110424. https://doi.org/10.1016/j.celrep.2022.110424 For Model Benchmarking If you're interested in using this dataset for benchmarking foundation models or feature extractors, we recommend accessing the dataset through the Hugging Face Hub at dakomura/tcga-ut. The Hugging Face version provides: Predefined train/validation/test splits (both internal and external facility-based splits) Ready-to-use benchmarking framework for foundation models WebDataset format support for efficient data loading Example implementations for state-of-the-art model evaluation
histopathology, cancer
histopathology, cancer
citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 2 | |
popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |