CPAISD: Core-Penumbra Acute Ischemic Stroke Dataset

The dataset contains 112 non-contrast cranial CT scans of patients with hyperacute stroke, featuring delineated zones of penumbra and core of the stroke on each slice where present. The data in the dataset are anonymized using the Kitware DicomAnonymizer, with standard anonymization settings, except for preserving the values of the following fields: (0x0010, 0x0040) – Patient's Sex (0x0010, 0x1010) – Patient's Age (0x0008, 0x0070) – Manufacturer (0x0008, 0x1090) – Manufacturer’s Model Name The patient's sex and age are retained for demographic analysis of the samples, and the equipment manufacturer and model are kept for dataset statistics and the potential for domain shift analysis. The dataset is split into three folds: Training fold (92 studies, 8,376 slices). Validation fold (10 studies, 980 slices). Testing fold (10 studies, 809 slices). The dataset has the following structure: metadata.json – dataset metadata summary.csv – metadata of each study in a CSV format table Part of the dataset (train, val, and test) Study Slice raw.dcm – original slice file image.npz – slice in Numpy array format mask.npz – segmentation mask in Numpy array format metadata.json – slice metadata in JSON format metadata.json – study metadata in JSON format The metadata.json at the root of the dataset has the following format: generation_params – dataset generation parameters: test_size – proportion of the test part val_size – proportion of the validation part stats – statistical data: common – general statistical data: train_size_in_studies – number of studies in the training part of the dataset. train_size_in_images – number of slices in the training part of the dataset. val_size_in_studies – number of studies in the validation part of the dataset. val_size_in_images – number of slices in the validation part of the dataset. test_size_in_studies – number of studies in the test part of the dataset. test_size_in_images – number of slices in the test part of the dataset. train – statistical data for the training part of the dataset: min – minimum pixel value. max – maximum pixel value. mean – average pixel value. std – standard deviation for all pixel values. The metadata.json at the root of the study has the following format, if a field value is unknown, it is given as 'unknown': manufacturer – manufacturer of the tomograph. model – model of the tomograph. device – full name of the tomograph (manufacturer + model). age – patient's age in years. sex – patient's sex. M – male, F – female. dsa – whether cerebral angiography was performed. true if yes, false if no. nihss – NIHSS score. time – time in hours from the onset of the stroke to the conduct of the study. Can be either a number or a range. lethality – whether the person died as a result of this stroke. true if yes, false if no. The summary.csv contains the same fields as the `metadata.json` from the root of the study, plus two additional fields: name – name of the study. part – part of the dataset in which the study is located.

Keywords

Stroke, core, penumbra, CT

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

1

Average

Related to Research communities

Neuroscience