
This is a dataset of histological slides from the GTEx project that has been balanced for 3 major factors (organ, sex, and age bracket) that may be useful to train models in supervised or self-supervised modes. Four datasets are avaialble: gtex_histology_balanced_3_slides_200_tiles.tar.gz: Conditioned on the 3 factors, 3 slides were selected per group, and 200 tiles in tissue segmented areas selected randomly per slide. gtex_histology_balanced_3_slides_2000_tiles.tar.gz: Conditioned on the 3 factors, 3 slides were selected per group, and 2000 tiles in tissue segmented areas selected randomly per slide. gtex_histology_balanced_10_slides_100_tiles.tar.gz: Conditioned on the 3 factors, 10 slides were selected per group (when possible), and 100 tiles in tissue segmented areas selected randomly per slide. This dataset matches closely the "gtex_histology_balanced_3_slides_200_tiles.tar.gz" dataset in total number of tiles. gtex_histology_balanced_10_slides_800_tiles.tar.gz: Conditioned on the 3 factors, 10 slides were selected per group (when possible), and 800 tiles in tissue segmented areas selected randomly per slide. This dataset matches closely the "gtex_histology_balanced_3_slides_200_tiles.tar.gz" dataset in total number of tiles. Each archive file contains the following: slide_annotation.csv: a slide-level annotation of the slides (see below) train: a directory with image tiles to be used to train a model valid: a directory with image tiles to be used to validate a model The slide_annotation file contains publicly available information on the slides in addition to 3 columns: "Tissue_simple": the organ of the slide "split": whether the slide was assign the 'train' or 'valid' split for training. The validation split slides have 1/10th of the tiles from training. "n_tiles": the number of image tiles in the dataset for each slide Example: Tissue Sample ID Tissue Subject ID Sex Age Bracket Hardy Scale Pathology Categories Pathology Notes Tissue_simple split n_tiles GTEX-1128S-1426 Esophagus - Mucosa GTEX-1128S female 60-69 Fast death - natural causes 6 pieces, near- total autolysis/mucosa completely sloughed Esophagus train 200 GTEX-113JC-1226 Stomach GTEX-113JC female 50-59 Fast death - natural causes 6 pieces, well dissected mucosa; some areas are severely autolyzed Stomach valid 20 GTEX-1192W-2526 Muscle - Skeletal GTEX-1192W male 60-69 Fast death - natural causes 2 pieces, ~10-20% interstitial fat, rep foci delineated Muscle train 200 GTEX-1192X-0426 Muscle - Skeletal GTEX-1192X male 50-59 Slow death 2 pieces, 5-10% interstitial fat, rep. foci delineated Muscle valid 20 GTEX-11DXX-1326 Stomach GTEX-11DXX female 60-69 Ventilator case gastritis 6 pieces, mild chronic active gastritis Stomach train 200 Inside train and valid and JPEG files named with the following convention: ......jpg such that the origin of the crops can be traced and the file name serve as a direct class label if desired. Examples: "GTEX-ZYT6-1326.Pancreas.male.30-39.47492.16064.jpg", "GTEX-WWYW-2726.Ovary.female.50-59.5024.15008.jpg.
The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health (commonfund.nih.gov/GTEx). Additional funds were provided by the NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. Donors were enrolled at Biospecimen Source Sites funded by NCI\Leidos Biomedical Research, Inc. subcontracts to the National Disease Research Interchange (10XS170), Roswell Park Cancer Institute (10XS171), and Science Care, Inc. (X10S172). The Laboratory, Data Analysis, and Coordinating Center (LDACC) was funded through a contract (HHSN268201000029C) to the The Broad Institute, Inc. Biorepository operations were funded through a Leidos Biomedical Research, Inc. subcontract to Van Andel Research Institute (10ST1035). Additional data repository and project management were provided by Leidos Biomedical Research, Inc.(HHSN261200800001E). The Brain Bank was supported supplements to University of Miami grant DA006227. Statistical Methods development grants were made to the University of Geneva (MH090941 & MH101814), the University of Chicago (MH090951,MH090937, MH101825, & MH101820), the University of North Carolina - Chapel Hill (MH090936), North Carolina State University (MH101819),Harvard University (MH090948), Stanford University (MH101782), Washington University (MH101810), and to the University of Pennsylvania (MH101822). The datasets used for the analyses described in this manuscript were obtained from dbGaP at http://www.ncbi.nlm.nih.gov/gap through dbGaP accession number phs000NNN.vN.pN – please insert relevant accession numbers.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
