Germinal center-mediated broadening of B cell responses to SARS-CoV-2 booster immunization

These are the processed BCR repertoire and transcriptomics data described in Malladi & Jaiswal et al., Science Immunology, 2025. The raw sequencing data are available on SRA under BioProject PRJNA1185243. Code Code along with Docker containers for reproducing the NGS data-based figures and analyses in the published paper can be found on GitHub. Metadata File: WU382_malladi_et_al_sci_imm_2025_meta.tsv.gz Notes: Sample collection time was originally recorded in days in the `timepoint_day` column. Timepoints were coded as such in the BCR data. Timepoints were mainly referenced in weeks in the manuscript, as shown in the `timepoint_week` column. The `seq_type` column indicates the platform from which sequences originated. `bulk` = bulk BCR sequencing `tgx` = 10x Genomics single-cell VDJ + 5' gene expression Abbreviations: LN = lymph node PB = plasmablast GC = germinal center LNPC = lymph node plasma cell NS = no sorting Processed BCR data - heavy chains File: WU382_malladi_et_al_sci_imm_2025_bcr_heavy.tsv.gz Analysis was based on heavy chain-based clonal inference. Notes on columns: The columns largely follow the AIRR-C Rearrangement format. The main deviation is that CDR3s were used, as opposed to IMGT-defined "junctions". Nonetheless, junction-related columns are included here as some repositories use these. Non-standard columns are noted below. `cell_id`: Only sequences from single-cell samples have cell IDs. 10x sequences follow the format `[donor]_[sample]@[id]`. `NA` for bulk sequences. `sequence_id`: Sequence IDs follow the format `[donor]_[sample]@[id]`. `v_call_genotyped`: V gene annotation reassigned after individualized genotyping by TIgGER. `germline_[vdj]_call`: Clonal consensus germline calls after corresponding clonal consensus sequences were reconstructed via `CreateGermlines.py --cloned` from Change-O. `c_call`: constant gene annotation extracted from output of `cellranger vdj` for 10x sequences. `NA` for bulk sequences. Unlike `isotype`, `c_call` is down to the resolution of isotype subclass. `collapse_count`: Number of duplicate IMGT-aligned V(D)J sequences that were collapsed by `alakazam::collapseDuplicates`. `gex_anno`: Cell type identity annotation based on transcriptomic profiles. Mapped from `anno_leiden_0.25` from WU382_malladi_et_al_sci_imm_2025_gex_b_cells.h5ad. `compartment`: B cell compartment. `clone_id`: B cell clonal lineage IDs follow the format `[donor]@[id]`. `s_pos_clone`: `TRUE` if a sequence belonged to a B cell clone that was designated as S-binding by virtue of containing one of the recombinant mAbs that tested positive via ELISA. `expressed_id`: IDs of expressed mAbs. `NA` for everything else. `s_binding`: ELISA results for binding of recombinant mAbs to SARS-CoV-2 S. `s_pos` and `s_neg` for positive and negative binding results respectively. `s_not_expressed` if not selected for expression. `s_na` if expression failed, was sticky, etc. `nuc_RS_1_312`: number of replacement and silent mutations between IMGT-numbered nucleotide positions 1-312 along IGHV sequences, calculated by `shazam::calcObservedMutations`. `nuc_denom_1_312`: number of informative nucleotide positions for counting mutations, excluding non-A/T/G/C positions (such as "N", "-", "."). `nuc_RS_freq_1_312`: nucleotide-level mutation frequency (= nuc_RS_1_312 / nuc_denom_1_312). Processed BCR data - light chains File: WU382_malladi_et_al_sci_imm_2025_bcr_light.tsv.gz Light chains were not used for heavy chain-based clonal inference or analysis. Processed transcriptomics data Files: WU382_malladi_et_al_sci_imm_2025_gex_all_cells.h5ad WU382_malladi_et_al_sci_imm_2025_gex_b_cells.h5ad WU382_malladi_et_al_sci_imm_2025_gex_b_cell_umap.tsv.gz Notes on the `h5ad` files: These files can be imported into Scanpy as an AnnData object. Each `AnnData` object has 3 `.layers`, each representing a version of the count matrix. `raw_counts`: Imported from `cellranger aggr` output by `scanpy.read_10x_mtx`. `log_norm`: Log-noramlized expression values outputted by `scanpy.pp.normalize_total` followed by `scanpy.pp.log1p`. `scaled`: The `log_norm` layer scaled to unit variance and zero mean by `scanpy.pp.scale`. The `gene_name` and `biotype` columns in `.var` were extracted from GENCODE v32 GTF. Columns in `.obs` (each row corresponds to a cell) `n_feature`: The `n_genes_by_counts` column produced by `scanpy.pp.calculate_qc_metrics`, renamed. The number of genes expressed. This is before subsetting the genes. `n_umi`: The `total_counts` column produced by `scanpy.pp.calculate_qc_metrics`, renamed. The total UMI counts in a cell. `pct_mt`: The `pct_counts_mt` column produced by `scanpy.pp.calculate_qc_metrics`, renamed. The percentage of counts in mitochondrial genes. `n_hkg`: The number of housekeeping genes for which expression was detected. `n_gene_expressed`: The total number of genes for which expression was detected. This is after subsetting the genes. `pre_qc_bcr`: `TRUE` if a cell also had paired BCR data available. Produced by cross-referencing the cellular barcodes in `cell_barcodes.json` outputted by `cellranger vdj`. At this point the BCR data had not gone through the QC process in the BCR processing pipeline (hence `pre_qc`). `leiden_[resolution]`: Cluster assignment by `scanpy.tl.leiden`. `anno_leiden_[resolution]`: Cell type identity annotations based on transcriptomic profiles. This was mapped onto the `gex_anno` column in the processed heavy chain BCR data. UMAP coordinates can be found in `.obsm["X_umap"]`. `.X` has been set to `None` in order to reduce file size. Note on the `tsv.gz` file: This file was derived from WU382_malladi_et_al_sci_imm_2025_gex_b_cells.h5ad. It contains UMAP coordinates of the cells used for visualization in conjunction with BCR data. In addition, the preprocessed count matrix outputted by `cellranger aggr` is available from GEO under BioProject PRJNA1185243.

Related Organizations

University of Mary
United States

Keywords

B cell repertoire, AIRR-seq, COVID-19, vaccination, Antibody

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Related to Research communities

Corona Virus Disease