
This archive provides all datasets needed to reproduce the single‐cell data integration detailed in the paper Single-cell integration and multi-modal profiling reveals phenotypes and spatial organization of neutrophils in colorectal cancer DOI: 10.1016/j.ccell.2025.12.003 The archive comprises the following files: crc_atlas_models.tar.xz: Trained scVI (unsupervised) and scANVI (cell-type aware) models for the global CRC atlas and tissue-specific subsets (normal, tumor, metastasis). Enable projection of external data onto the CRC atlas, expecting Ensembl IDs (e.g. ENSG00000105329) as var_names. Trained with scvi-tools v1.4.1. crc_atlas_models_minified.tar.xz: Lightweight minified models that only retain the weights necessary for downstream inference. Optimized for scArches workflows, including reference mapping and automated cell-type label transfer. MUI_Innsbruck-adata.h5ad: In-house scRNA-seq dataset from CRC cohort I (n = 12) comprising matched peripheral blood, adjacent normal, and tumor samples generated using the BD Rhapsody platform. input_datasets.tar.xz: Preprocessed input datasets in .h5ad format required to build the CRC scRNA-seq atlas. downstream_analyses.tar.xz: Fully executed HTML notebooks and corresponding analysis outputs used to generate the main single-cell atlas figures in the paper. downstream_analyses_de_analysis.tar.xz: DESeq2-based differential expression analyses on pseudobulked data by cell type for various matched comparisons within the CRC atlas. Includes RDS files, result TSV tables, and short summaries for each comparison. remove_ambient_rna.tar.xz: A subset of 24 .h5ad datasets with scAR-denoised counts. The original unfiltered count matrices are available in input_datasets.tar.xz. containers.tar.xz: Singularity .sif images encapsulating all software dependencies required to fully reproduce the workflow. shears_tutorial.tar.xz: Input datasets in .h5ad format required to execute the shears tutorial. Includes the single-cell CRC reference and combined bulk clinical cohorts to demonstrate both the quantitative deconvolution and the single-cell phenotypic modeling (e.g., mapping clinical outcomes to single cells) introduced in this paper. The CRC atlas is publicly available for download and interactive exploration through a cell-x-gene instance with standardized metadata, which allows custom analyses of the atlas. For more information, check out the project website and our github repository.
