
cgDist is an ultra-fast distance calculator for bacterial genomics that computes SNP and InDel-level distances directly from core genome Multi-Locus Sequence Typing (cgMLST) allelic profiles. Key Innovation While traditional cgMLST analysis treats all allelic differences as equivalent units, cgDist achieves nucleotide-level resolution by performing pairwise sequence alignment only on differing alleles. This approach bridges the gap between the computational efficiency of cgMLST and the genetic resolution of SNP-based methods. Main Features Multi-mode distance calculations: SNPs-only, SNPs+InDel-events, SNPs+InDel-bases Unified cache architecture: Enables incremental surveillance where new samples are analyzed without re-aligning the entire dataset Allele caller agnostic: Compatible with any cgMLST schema (chewBBACA, BLAST, etc.) Integrated recombination detection: Identifies potential horizontal gene transfer events High performance: 94% time reduction with progressive performance gains as cache hit rates reach 88.3% Use Cases cgDist is designed for outbreak investigation and source attribution in bacterial genomics, particularly for foodborne pathogens (Salmonella, Listeria monocytogenes). It provides fine-scale genetic discrimination for epidemiological clustering while maintaining compatibility with existing cgMLST surveillance workflows. Citation If you use cgDist in your research, please cite our manuscript: bioRxiv preprint: https://doi.org/10.1101/2025.10.16.682749 Implementation Written in Rust for high performance. Source code includes comprehensive documentation, installation instructions, usage examples, and API reference.
