
These are the processed BCR repertoire and transcriptomics data described in Malladi & Jaiswal et al., Science Immunology, 2025. The raw sequencing data are available on SRA under BioProject PRJNA1185243. Code Code along with Docker containers for reproducing the NGS data-based figures and analyses in the published paper can be found on GitHub. Metadata File: WU382_malladi_et_al_sci_imm_2025_meta.tsv.gz Notes: Sample collection time was originally recorded in days in the `timepoint_day` column. Timepoints were coded as such in the BCR data. Timepoints were mainly referenced in weeks in the manuscript, as shown in the `timepoint_week` column. The `seq_type` column indicates the platform from which sequences originated. `bulk` = bulk BCR sequencing `tgx` = 10x Genomics single-cell VDJ + 5' gene expression Abbreviations: LN = lymph node PB = plasmablast GC = germinal center LNPC = lymph node plasma cell NS = no sorting Processed BCR data - heavy chains File: WU382_malladi_et_al_sci_imm_2025_bcr_heavy.tsv.gz Analysis was based on heavy chain-based clonal inference. Notes on columns: The columns largely follow the AIRR-C Rearrangement format. The main deviation is that CDR3s were used, as opposed to IMGT-defined "junctions". Nonetheless, junction-related columns are included here as some repositories use these. Non-standard columns are noted below. `cell_id`: Only sequences from single-cell samples have cell IDs. 10x sequences follow the format `[donor]_[sample]@[id]`. `NA` for bulk sequences. `sequence_id`: Sequence IDs follow the format `[donor]_[sample]@[id]`. `v_call_genotyped`: V gene annotation reassigned after individualized genotyping by TIgGER. `germline_[vdj]_call`: Clonal consensus germline calls after corresponding clonal consensus sequences were reconstructed via `CreateGermlines.py --cloned` from Change-O. `c_call`: constant gene annotation extracted from output of `cellranger vdj` for 10x sequences. `NA` for bulk sequences. Unlike `isotype`, `c_call` is down to the resolution of isotype subclass. `collapse_count`: Number of duplicate IMGT-aligned V(D)J sequences that were collapsed by `alakazam::collapseDuplicates`. `gex_anno`: Cell type identity annotation based on transcriptomic profiles. Mapped from `anno_leiden_0.25` from WU382_malladi_et_al_sci_imm_2025_gex_b_cells.h5ad. `compartment`: B cell compartment. `clone_id`: B cell clonal lineage IDs follow the format `[donor]@[id]`. `s_pos_clone`: `TRUE` if a sequence belonged to a B cell clone that was designated as S-binding by virtue of containing one of the recombinant mAbs that tested positive via ELISA. `expressed_id`: IDs of expressed mAbs. `NA` for everything else. `s_binding`: ELISA results for binding of recombinant mAbs to SARS-CoV-2 S. `s_pos` and `s_neg` for positive and negative binding results respectively. `s_not_expressed` if not selected for expression. `s_na` if expression failed, was sticky, etc. `nuc_RS_1_312`: number of replacement and silent mutations between IMGT-numbered nucleotide positions 1-312 along IGHV sequences, calculated by `shazam::calcObservedMutations`. `nuc_denom_1_312`: number of informative nucleotide positions for counting mutations, excluding non-A/T/G/C positions (such as "N", "-", "."). `nuc_RS_freq_1_312`: nucleotide-level mutation frequency (= nuc_RS_1_312 / nuc_denom_1_312). Processed BCR data - light chains File: WU382_malladi_et_al_sci_imm_2025_bcr_light.tsv.gz Light chains were not used for heavy chain-based clonal inference or analysis. Processed transcriptomics data Files: WU382_malladi_et_al_sci_imm_2025_gex_all_cells.h5ad WU382_malladi_et_al_sci_imm_2025_gex_b_cells.h5ad WU382_malladi_et_al_sci_imm_2025_gex_b_cell_umap.tsv.gz Notes on the `h5ad` files: These files can be imported into Scanpy as an AnnData object. Each `AnnData` object has 3 `.layers`, each representing a version of the count matrix. `raw_counts`: Imported from `cellranger aggr` output by `scanpy.read_10x_mtx`. `log_norm`: Log-noramlized expression values outputted by `scanpy.pp.normalize_total` followed by `scanpy.pp.log1p`. `scaled`: The `log_norm` layer scaled to unit variance and zero mean by `scanpy.pp.scale`. The `gene_name` and `biotype` columns in `.var` were extracted from GENCODE v32 GTF. Columns in `.obs` (each row corresponds to a cell) `n_feature`: The `n_genes_by_counts` column produced by `scanpy.pp.calculate_qc_metrics`, renamed. The number of genes expressed. This is before subsetting the genes. `n_umi`: The `total_counts` column produced by `scanpy.pp.calculate_qc_metrics`, renamed. The total UMI counts in a cell. `pct_mt`: The `pct_counts_mt` column produced by `scanpy.pp.calculate_qc_metrics`, renamed. The percentage of counts in mitochondrial genes. `n_hkg`: The number of housekeeping genes for which expression was detected. `n_gene_expressed`: The total number of genes for which expression was detected. This is after subsetting the genes. `pre_qc_bcr`: `TRUE` if a cell also had paired BCR data available. Produced by cross-referencing the cellular barcodes in `cell_barcodes.json` outputted by `cellranger vdj`. At this point the BCR data had not gone through the QC process in the BCR processing pipeline (hence `pre_qc`). `leiden_[resolution]`: Cluster assignment by `scanpy.tl.leiden`. `anno_leiden_[resolution]`: Cell type identity annotations based on transcriptomic profiles. This was mapped onto the `gex_anno` column in the processed heavy chain BCR data. UMAP coordinates can be found in `.obsm["X_umap"]`. `.X` has been set to `None` in order to reduce file size. Note on the `tsv.gz` file: This file was derived from WU382_malladi_et_al_sci_imm_2025_gex_b_cells.h5ad. It contains UMAP coordinates of the cells used for visualization in conjunction with BCR data. In addition, the preprocessed count matrix outputted by `cellranger aggr` is available from GEO under BioProject PRJNA1185243.
B cell repertoire, AIRR-seq, COVID-19, vaccination, Antibody
B cell repertoire, AIRR-seq, COVID-19, vaccination, Antibody
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
