Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2025
License: CC BY NC SA
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2025
License: CC BY NC SA
Data sources: ZENODO
ZENODO
Dataset . 2025
License: CC BY NC SA
Data sources: Datacite
ZENODO
Dataset . 2025
License: CC BY NC SA
Data sources: Datacite
ZENODO
Dataset . 2025
License: CC BY NC SA
Data sources: Datacite
versions View all 3 versions
addClaim

Histo-Miner: NucSeg and TumSeg datasets

Authors: Sancéré, Lucas; Lorenz, Carina; Brägelmann, Johannes; Bozek, Katarzyna; Helbig, Doris;

Histo-Miner: NucSeg and TumSeg datasets

Abstract

I. General Training dataset used for Histo-Miner paper. 2 datasets were used to train SCC-Hovernet: UncuratedSCC NucSeg 1 dataset was used to train SCC-Segmenter: TumSeg II. NucSeg Datasets The dataset is available here: NucSeg.zip. The dataset consists of annotated H&E patches for which the cell nucei are segmented and classified. 47,392 nuclei were labeled in total (3,135 granulocytes, 12,263 lymphocytes, 3,271 plasma cells, 11,526 stromal cells, 17,197 tumor cells). The dataset is composed of 6,816 patches of 560x560 pixels with 70% overlap in a 5D numpy array according to the Hovernet data format requirements. The patches are coming from 24WSIs of 20 cSCC patients. The resolutions of the images are a mix of 40x and 20x (see IV. Patient IDs for more information). The channels of the arrays are [RGB, inst, type] where: 'RGB' is the 3 channels raw image 'inst' is the instance segmentation ground truth: every pixel range from 0 to N, where 0 is background and N is the number of nuclear instances 'type' is the nuclear type ground truth: every pixel ranges from 0-K, where 0 is background and K is the number of classes. The dataset format is fitting Hovernet-like architecture training but is not conveniant for any visualization or training of other models. This is why, another more conventional format is available for this dataset, and you can see it here: NucSeg_OriginalFormat.zip. In this case the 'RGB', 'inst', 'type' data are saved in numpy format in different folders (RawImages, InstanceMaps, ClassMaps). For instance the user can apply the functions save2dnpy_2png and save3dnpy_2png from histo_miner.utils.filemanagement to generate PNG from these files. The dataset contains 1,707 H&E non-overlapping patches of 256x256 pixels with no overlap. As described in the paper, the SCC Hovernet model was first pretrained with a Not-Curated dataset, meaning the segmentation and cell classification contains several errors, that are not quantified. It is not recommanded to use this dataset for training, only for pre-training as a first step preceding another training step with another dataset. This Not-Curated dataset is available here: UncuratedSCC.zip. The file organization follow the one of NucSeg. III. TumSeg Dataset The dataset is available here TumSeg.zip. The dataset consists of pairs for raw WSIs images and binary segmentation images, for which the tumor region was annotated. 144 WSIsof 125 cSCC patients were collected for this dataset. The resolution of the WSIs is downsample to 1.25x. IV. Patient IDs For both datasets, a csv file is available to associate each file to its corresponding patient (anonymised). For NucSeg dataset, the resolutions of the WSIs from which the patches are extracted are also shown. In version 2 of the dataset we changed the Patients IDs of TumSeg to remove missleading names. The correspondance image - patient is unchanged, only names are updated. V. Funding Notes Lucas Sancéré and Kasia Bozek were supported by the North Rhine-Westphalia return program (311-8.03.03.02-147635) and hosted by the Center for Molecular Medicine Cologne. Johannes Brägelmann and Carina Lorenz received funding from a Milded Scheel Nachwuchszentrum Grant 70113307 by the German Cancer Aid (Deutsche Krebshilfe)

Related Organizations
Keywords

Segmentation, Histology, Annotations, Skin cancer, Object Classification

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Related to Research communities
Cancer Research