
AbstractMapping gene expression as a quantitative trait using whole genome-sequencing and transcriptome analysis allows to discover the functional consequences of genetic variation. We developed a novel method and ultra-fast software Findr for higly accurate causal inference between gene expression traits using cis-regulatory DNA variations as causal anchors, which improves current methods by taking into account hidden confounders and weak regulations. Findr outperformed existing methods on the DREAM5 Systems Genetics challenge and on the prediction of microRNA and transcription factor targets in human lymphoblastoid cells, while being nearly a million times faster. Findr is publicly available athttps://github.com/lingfeiwang/findr.Author summaryUnderstanding how genetic variation between individuals determines variation in observable traits or disease risk is one of the core aims of genetics. It is known that genetic variation often affects gene regulatory DNA elements and directly causes variation in expression of nearby genes. This effect in turn cascades down to other genes via the complex pathways and gene interaction networks that ultimately govern how cells operate in an ever changing environment. In theory, when genetic variation and gene expression levels are measured simultaneously in a large number of individuals, the causal effects of genes on each other can be inferred using statistical models similar to those used in randomized controlled trials. We developed a novel method and ultra-fast software Findr which, unlike existing methods, takes into account the complex but unknown network context when predicting causality between specific gene pairs. Findr’s predictions have a significantly higher overlap with known gene networks compared to existing methods, using both simulated and real data. Findr is also nearly a million times faster, and hence the only software in its class that can handle modern datasets where the expression levels of ten-thousands of genes are simultaneously measured in hundreds to thousands of individuals.
Genomics (q-bio.GN), Models, Statistical, QH301-705.5, Molecular Networks (q-bio.MN), Chromosome Mapping, Genetic Variation, High-Throughput Nucleotide Sequencing, Quantitative Biology - Quantitative Methods, FOS: Biological sciences, Databases, Genetic, Journal Article, Quantitative Biology - Genomics, Quantitative Biology - Molecular Networks, Biology (General), Transcriptome, Algorithms, Quantitative Methods (q-bio.QM), Research Article
Genomics (q-bio.GN), Models, Statistical, QH301-705.5, Molecular Networks (q-bio.MN), Chromosome Mapping, Genetic Variation, High-Throughput Nucleotide Sequencing, Quantitative Biology - Quantitative Methods, FOS: Biological sciences, Databases, Genetic, Journal Article, Quantitative Biology - Genomics, Quantitative Biology - Molecular Networks, Biology (General), Transcriptome, Algorithms, Quantitative Methods (q-bio.QM), Research Article
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 34 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
