
Interpretable Inflammation Landscape of Circulating Immune cells This repository contains scRNA-seq processed datasets and metadata used in the manuscript entitled "Interpretable Inflammation Landscape of Circulating Immune cells". Abstract Inflammation is a biological phenomenon beneficial for homeostasis, but unfavorable if dysregulated. Although major progress has been made in characterizing inflammation in specific diseases, a global, holistic understanding is still elusive. This is particularly intriguing, considering its function for human health and the potential for modern medicine if fully deciphered. Here, we leverage advances in single-cell genomics to delineate inflammatory processes of circulating immune cells during infection, immune-mediated inflammatory diseases and cancer. Our single-cell atlas of >6.5 million peripheral blood mononuclear cells from 1047 patients (56% female, 43% male) and 19 diseases allowed us to learn a comprehensive model of inflammation in circulating immune cells. The atlas expanded our current knowledge of the biology of inflammation of immune-mediated diseases (7), acute (1) and chronic (3) inflammatory diseases, infection (4) and solid tumors (4), and laid the foundation to develop a precision medicine framework using unsupervised as well as explainable machine learning. Beyond a disease-centered analysis, we charted altered activity of inflammatory molecules in peripheral blood cells, depicting discriminative inflammation-related genes to further understand mechanisms of inflammation. Finally, we laid the groundwork for learning a classifier for inflammatory diseases, presenting cells in circulation as a powerful resource for patient classification. Inflammation atlas cohort description The project includes in-house single-cell RNA-sequencing data generation from samples shared by our collaborators from several research institutions. Samples were collected with written informed consent obtained from all participants and comply with the ethical guidelines for human samples. Specifically, we generated data from patients suffering Rheumatoid Arthritis (RA), Psoriatic Arthritis (PSA), Crohn's Disease (CD), Ulcerative Colitis (UC), Psoriasis (PS), Systemic Lupus Erythematosus (SLE) and healthy controls in collaboration with the Vall d’Hebron Research Institute within the DoCTIS consortia (SCGT00). Additionally, we processed and obtained data from healthy controls in collaboration with the Institut Hospital del Mar d'Investigacions Mèdiques (SCGT01); Asthma, Chronic Obstructive Pulmonary Disease (COPD) and healthy control samples in collaboration with the University Medical Center Groningen (SCGT02); Breast Cancer (BRCA) samples in collaboration with the Vall d’Hebron Institute of Oncology (SCGT03); cirrhosis samples in collaboration with the Biomedical Research Institut Sant Pau (SCGT04); samples of patients suffering Colorectal Cancer (CRC) in collaboration with the Katholieke Universiteit Leuven (SCGT05) and, finally, COVID and healthy control samples also in collaboration with Biomedical Research Institut Sant Pau (SCGT06). Moreover, we also included publicly available datasets to complete our cohort. Specifically, we considered data from patients suffering sepsis from Reyes et al. (1) and Jiang et al. (2), Head and Neck Squamous Cell Carcinoma (HNSCC) from Cillo et al. (3), Hepatitis B Virus (HBV) from Zhang et al. (4), Multiple Sclerosis (MS) from Schafflick et al. (5), NasoPharyngeal Cancer (NPC) from Liu et al. (6), Human Immunodeficiency Virus (HIV) from Palshikar et al. (7) and Wang et al. (8), SLE from Perez et al. (9), Savage et al. (10) and Mistry et al. (11), cirrhosis from Ramachandran et al. (12), CD from Martin et al. (13), COVID-Flu-Sepsis from COMBAT from Ahren et al. (14) as well as COVID from Ren et al. (15) and healthy controls from Terekhova et al. (16) and 10X Genomics together with the available healthy samples from all the cited studies. NOTE: Further details on dataset and sample included in the inflammation atlas can be found in Supplementary Table 1. Sheet 1-2. Raw data (FASTQ) Single-cell RNA-sequencing (scRNA-seq) in-house generated data and associated count matrices are accessible at Sequence Read Archieve (SRA), NCBI Gene Expression Omnibus (GEO) and European Genome Archive (EGA) databases. Previously published scRNA-seq data included in this project, either FASTQ files or processed count matrices, were obtained from GEO, BioStudies Array Expresse, Broad Institute DUOS, Synapse, Genome Sequence Analysis (GSA), CellXGene Data Portal, and 10X Genomics. Inflammation atlas cohort split We divided our dataset into three distinct groups, each serving different purposes aligned with the paper’s objectives and downstream analysis (see Fig. 1b in the manuscript). Core: We selected a set of studies to generate the Inflammation reference atlas. These samples were randomly split, considering multiple covariates such as study ID, chemistry, and disease, into two subgroups: Main (atlas): Samples used to build the reference annotation, to extract biological findings and to train the patient classifier. Validation (unseen patients): Samples used for the first level of validation of the patient classifier. These include Core samples never seen by the classifier. External (unseen studies): We selected a set of studies to evaluate the performance of the patient classifier. These samples represent the second level of validation using an independent set of samples and studies. External studies include samples profiled with the same and different chemistries as the Core data. NOTE: Further details on dataset and sample splitting can be found in Supplementary Table 1. Sheet 3. Additionally, a four level of dataset splitting was done for a centralized, multi-disease scenario (SCGT00 dataset). SCGT00_CentralizedDataset: We selected a single study that includes data from 6 diseases + healthy controls, that have been generated in the same research center, with a single assay chemistry, and by the same technician. These samples were pooled in groups of 8 patients, thus we stratified them by sequencing pool and disease, ensuring that reference and query patients belong to distinct cohorts. Main (SCGT00 atlas): Samples used to build the reference annotation, to extract biological findings and to train the patient classifier. Validation (external): Samples used for the patient classifier. NOTE: Further details on SCGT00_CentralizedDataset sample splitting can be found in Supplementary Table 1. Sheet 4. To regenerate this object and reproduce the manuscript results, the INFLAMMATION ATLAS data should be regenerated from "core", and then, split based on the details provided in Sheet 4. ZENODO REPOSITORY Supplementary_Table_1.xlsx:Dataset overview of human PBMCs samples. This file contains general information regarding the datasets and the clinical information of the samples included in the current study. Sheet 1: byStudyID. Details on the dataset (studyID), where the data has been generated (in-house or public), the 10X Genomics chemistry, the publication and the dataset reference (in case of public data), and if we have remapped the FASTQ files. In all cases, we provide the CellRanger and Reference Genome version used. Additionally, for each disease, we provide the number of donors collected before the quality control. Sheet 2: byDisease_splitted. Summary of the number of patients per disease and stratified by subsets (Main, unseen patients or unseen studies), considering sex and binned age categories. Sheet 3: bySampleID_afterQC. Details regarding the technical and clinical metadata per sample; for the missing metadata information (NA is displayed). Sheet 4: SCGT00_CentralizedDataset. Details of samples from a unified, centralized study of the patient cohort, processed by sample pools (patientPool) and stratification into Reference and Query subsets. INFLAMMATION_ATLAS_{group}_afterQC.h5ad: Raw count matrices after QC in h5ad format for each group [main, validation, external]. Here, only samples and cells that were not removed due to low quality control are included. Also, "main" and "validation" datasets were also filtered for non-expressed genes (80]. BMI: The Body Mass Index (BMI) of the patient at the time of sample collection. diseaseStatus: The current status or stage of the disease in the patient (e.g., COVID_severe, COVID_mild). smokingStatus: The smoking habits of the patient [smoker, never-smoker, former-smoker, NA] scANVI_models.zip: This compressed folder must be unzipped before use and contains scANVI (single-cell ANnotated Variational Inference) models: scANVI_atlas: This folder contains scANVI model that was trained on the full dataset, encompassing all identified cell types, including Red Blood Cells and Platelets. This comprehensive model is used to characterize the entire cellular landscape, capturing the diversity of immune and non-immune cells present in the dataset. It was used to project external datasets. scANVI_downstream: This folder contains a refined scANVI model specifically used for downstream analyses. It excludes Red Blood Cells and Platelets to remove possible confounding factors that could affect such analyses. Code availability The code to reproduce the full analysis presented in this article is hosted in the Github repository: https://github.com/Single-Cell-Genomics-Group-CNAG-CRG/Inflammation-PBMCs-Atlas
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
