
AbstractIn genotyping, determining Single Nucleotide Polymorphisms (SNPs) is standard practice, but it becomes difficult when analysing small quantities of input DNA, as is often required in forensic applications. Existing SNP genotyping methods, such as the HID SNP Genotyper Plugin (HSG) from Thermo Fisher Scientific, perform well with adequate DNA input levels but often produce erroneously called genotypes when DNA quantities are low. To mitigate these errors, genotype quality can be checked with the HSG. However, enforcing the HSG’s quality checks decreases the call rate by introducing more no-calls, and it does not eliminate all wrong calls. This study presents and validates a Symmetric Multinomial Logistic Regression (SMLR) model designed to enhance genotyping accuracy and call rate with small amounts of DNA. Comprehensive bootstrap and cross-validation analyses across a wide range of DNA quantities demonstrate the robustness and efficiency of the SMLR model in maintaining high call rates without compromising accuracy compared to the HSG. For DNA amounts as low as 31.25 pg, the SMLR method reduced the rate of no-calls by 50.0% relative to the HSG while maintaining the same rate of wrong calls, resulting in a call rate of 96.0%. Similarly, SMLR reduced the rate of wrong calls by 55.6% while maintaining the same call rate, achieving an accuracy of 99.775%. The no-call and wrong-call rates were significantly reduced at 62.5–250 pg DNA. The results highlight the SMLR model’s utility in optimising SNP genotyping at suboptimal DNA concentrations, making it a valuable tool for forensic applications where sample quantity and quality may be decreased. This work reinforces the feasibility of statistical approaches in forensic genotyping and provides a framework for implementing the SMLR method in practical forensic settings. The SMLR model applies for genotyping biallelic data with a signal (e.g. reads, counts, or intensity) for each allele. The model can also improve the allele balance quality check.
Massively parallel sequencing, Low DNA concentrations, Biallelic markers, Symmetric multinomial logistic regression, Forensic genetics, SNP genotyping
Massively parallel sequencing, Low DNA concentrations, Biallelic markers, Symmetric multinomial logistic regression, Forensic genetics, SNP genotyping
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
