
Recently, ultra high-throughput sequencing of RNA (RNA-Seq) has been developed as an approach for analysis of gene expression. By obtaining tens or even hundreds of millions of reads of transcribed sequences, an RNA-Seq experiment can offer a comprehensive survey of the population of genes (transcripts) in any sample of interest. This paper introduces a statistical model for estimating isoform abundance from RNA-Seq data and is flexible enough to accommodate both single end and paired end RNA-Seq data and sampling bias along the length of the transcript. Based on the derivation of minimal sufficient statistics for the model, a computationally feasible implementation of the maximum likelihood estimator of the model is provided. Further, it is shown that using paired end RNA-Seq provides more accurate isoform abundance estimates than single end sequencing at fixed sequencing depth. Simulation studies are also given.
Published in at http://dx.doi.org/10.1214/10-STS343 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)
Genomics (q-bio.GN), FOS: Computer and information sciences, Fisher information, Biochemistry, molecular biology, Paired end RNA-Seq data analysis, Sufficient statistics and fields, minimal sufficiency, isoform abundance estimation, Computational problems in statistics, Estimation in survival analysis and censored data, paired end RNA-Seq data analysis, Applications of statistics to biology and medical sciences; meta analysis, Methodology (stat.ME), FOS: Biological sciences, Quantitative Biology - Genomics, Genetics and epigenetics, Statistics - Methodology
Genomics (q-bio.GN), FOS: Computer and information sciences, Fisher information, Biochemistry, molecular biology, Paired end RNA-Seq data analysis, Sufficient statistics and fields, minimal sufficiency, isoform abundance estimation, Computational problems in statistics, Estimation in survival analysis and censored data, paired end RNA-Seq data analysis, Applications of statistics to biology and medical sciences; meta analysis, Methodology (stat.ME), FOS: Biological sciences, Quantitative Biology - Genomics, Genetics and epigenetics, Statistics - Methodology
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 62 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 1% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 1% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
