Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2025
License: CC BY NC SA
Data sources: ZENODO
ZENODO
Dataset . 2025
License: CC BY NC SA
Data sources: Datacite
ZENODO
Dataset . 2025
License: CC BY NC SA
Data sources: Datacite
versions View all 2 versions
addClaim

BEETLE: A multicentric dataset for training and benchmarking breast cancer segmentation in H&E slides

Authors: Lems, Carlijn M.; Tessier, Leslie; Bokhorst, John-Melle; van Rijthoven, Mart; Aswolinskiy, Witali; Pozzi, Matteo; Klubíčková, Natálie; +14 Authors

BEETLE: A multicentric dataset for training and benchmarking breast cancer segmentation in H&E slides

Abstract

Repository content This repository contains three zip files: 'annotations.zip' - annotations for the development set, provided in three formats: annotations.zip├── jsons/ # JSON format with tissue compartments annotated as polygons ├── label_map.json # mapping of pixel values to class labels├── masks/ # multiresolution TIFF images with pixel-wise class labels└── xmls/ # XML format with tissue compartments annotated as polygons 'images.zip' - images for the development and evaluation sets: images.zip├── development/ │ └── wsis/ # whole-slide images for development└── evaluation/ ├── rois/ # PNG images of ROIs for evaluation └── wsis/ # whole-slide images for evaluation 'model.zip' - weights of the final ensemble model used for the technical validation of the dataset. All data is released at a spacing of ~0.5 µm/pixel. Annotations in TIFF and XML formats are compatible with ASAP 2.1 Nightly. Both XML and JSON files contain the same annotations, but JSON is formatted for compatibility with nnU-Net-for-pathology-v2. The ROI PNG images include surrounding spatial context, allowing models to incorporate neighboring tissue architecture in their predictions, similar to whole-slide inference using a sliding-window approach. Public datasets This dataset includes images from two public sources: TCGA-BRCA (The Cancer Genome Atlas Breast Invasive Carcinoma) TIGER training set Note: WSIs from TIGER (including the TCGA-BRCA subset) must be downloaded separately from AWS Open Data. WSIs from TCGA-BRCA not in TIGER are included here. Four TIGER slides (IDs TCGA-AC-A2QH, TCGA-OL-A97C, TCGA-AR-A5QQ, TCGA-E9-A5FL) were excluded from this dataset. File ID nomenclature For images from the public TCGA-BRCA and TIGER datasets, we retained the original anonymized filenames provided by the respective sources. For all other images, we assigned each patient a unique anonymous patient ID, incrementing from 1. Because a single patient may have multiple WSIs, WSIs and annotations are named according to patient ID and WSI ID using the convention _., for example 'patient1_wsi1.tif'. For the evaluation set, ROIs are additionally indexed by ROI ID, following the convention __., for example 'patient1_wsi1_roi1.png'. Dataset overview Lastly, we include a 'data_overview.csv' file that documents metadata per WSI. We provide a table below that lists the metadata contained in each column. Column Contents 'patient_id' Unique anonymous patient ID, see ‘File ID nomenclature’ 'wsi_id' WSI ID, see 'File ID nomenclature' 'name' Full name of WSI, e.g., 'patient1_wsi1' 'source' (Clinical center) data source: 'biopticka', 'jb', 'nki', 'rumc', 'scdc', 'sch', 'tcga', or 'uwmedicine' 'specimen_type' WSI specimen type: ‘biopsy’ or ‘resection’ 'scanner' Scanner used to digitize the image 'wsi_path' WSI path starting with 'images/' (for non-TIGER images) 'annotation_mask_path' Path to the TIFF mask file (development set only), starting with 'annotations/' 'annotation_xml_path' Path to the XML annotation file (development set only), starting with 'annotations/' 'annotation_json_path' Path to the JSON annotation file (development set only), starting with 'annotations/' 'split' Dataset split: development/evaluation 'validation_fold' Validation fold of 5-fold cross-validation (development set only)

The BrEast cancEr hisTopathoLogy sEgmentation (BEETLE) dataset provides a development set and an external evaluation set for multiclass semantic segmentation of H&E-stained breast cancer whole-slide images (WSIs), covering all molecular subtypes and histological grades. Development set: 587 biopsies and resections collected from three collaborating clinical centers and two public datasets, digitized using seven scanners. Pixel-level annotations are available for four tissue classes: invasive epithelium, non-invasive epithelium, necrosis, and other, with particular focus on morphologies underrepresented in existing datasets, such as ductal carcinoma in situ and dispersed lobular tumor cells. External evaluation set: 54 biopsies and resections collected from three clinical centers and digitized with three scanners. In addition to the WSIs, 170 densely annotated regions of interest (ROIs) are provided as image tiles. The corresponding pixel-level annotations are not publicly released but are sequestered on the Grand Challenge platform, where submissions are evaluated on a public leaderboard to enable standardized and comparable benchmarking of breast cancer segmentation models.

Keywords

breast cancer, segmentation, histopathology, whole-slide images, deep learning, artificial intelligence, digital pathology, semantic segmentation

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Related to Research communities
Cancer Research