publication . Article . 2013

PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution

Yu Hu; Yichuan Liu; Xianyun Mao; Cheng Jia; Jane F. Ferguson; Chenyi Xue; Muredach P. Reilly; Hongzhe Li; Mingyao Li;
Open Access English
  • Published: 20 Dec 2013 Journal: Nucleic Acids Research, volume 42, issue 3, pages e20-e20 (issn: 0305-1048, eissn: 1362-4962, Copyright policy)
  • Publisher: Oxford University Press
Abstract
Correctly estimating isoform-specific gene expression is important for understanding complicated biological mechanisms and for mapping disease susceptibility genes. However, estimating isoform-specific gene expression is challenging because various biases present in RNA-Seq (RNA sequencing) data complicate the analysis, and if not appropriately corrected, can affect isoform expression estimation and downstream analysis. In this article, we present PennSeq, a statistical method that allows each isoform to have its own non-uniform read distribution. Instead of making parametric assumptions, we give adequate weight to the underlying data by the use of a non-paramet...
Subjects
free text keywords: Methods Online, Genetics, Data set, Gene, Parametric statistics, RNA-Seq, Biology, Molecular biology, Gene expression, Gene expression profiling, RNA Isoforms, Gene isoform
Funded by
NIH| Developing Statistical Methods for Disease Gene Discovery
Project
  • Funder: National Institutes of Health (NIH)
  • Project Code: 1R01HG004517-01A1
  • Funding stream: NATIONAL HUMAN GENOME RESEARCH INSTITUTE
,
NIH| Statistical Methods for Next-Generation Sequence Data
Project
  • Funder: National Institutes of Health (NIH)
  • Project Code: 3R01GM097505-04S1
  • Funding stream: NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES
,
NIH| Glycomics of Heart and Lung Disease in the Genomic Era
Project
  • Funder: National Institutes of Health (NIH)
  • Project Code: 1U01HL108636-01
  • Funding stream: NATIONAL HEART, LUNG, AND BLOOD INSTITUTE
,
NIH| Mentored Patient Oriented Research in Cardiometabolic Disease
Project
  • Funder: National Institutes of Health (NIH)
  • Project Code: 5K24HL107643-02
  • Funding stream: NATIONAL HEART, LUNG, AND BLOOD INSTITUTE
,
NIH| Translational Studies of ADAMTS7 a Novel GWAS Locus for Coronary Atherosclerosis
Project
  • Funder: National Institutes of Health (NIH)
  • Project Code: 5R01HL111694-04
  • Funding stream: NATIONAL HEART, LUNG, AND BLOOD INSTITUTE
31 references, page 1 of 3

Schena, M, Shalon, D, Davis, RW, Brown, PO. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 1995; 270: 467-470 [OpenAIRE] [PubMed]

Marioni, JC, Mason, CE, Mane, SM, Stephens, M, Gilad, Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res.. 2008; 18: 1509-1517 [OpenAIRE] [PubMed]

Wang, Z, Gerstein, M, Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet.. 2009; 10: 57-63 [OpenAIRE] [PubMed]

Wang, ET, Sandberg, R, Luo, S, Khrebtukova, I, Zhang, L, Mayr, C, Kingsmore, SF, Schroth, GP, Burge, CB. Alternative isoform regulation in human tissue transcriptomes. Nature. 2008; 456: 470-476 [OpenAIRE] [PubMed]

Zhao, K, Lu, ZX, Park, JW, Zhou, Q, Xing, Y. GLiMMPS: Robust statistical model for regulatory variation of alternative splicing using RNA-Seq data. Genome Biol.. 2013; 14: R74 [OpenAIRE] [PubMed]

Sun, W. A statistical framework for eQTL mapping using RNA-seq data. Biometrics. 2012; 68: 1-11 [OpenAIRE] [PubMed]

Jiang, H, Wong, WH. Statistical inferences for isoform expression in RNA-Seq. Bioinformatics. 2009; 25: 1026-1032 [OpenAIRE] [PubMed]

Trapnell, C, Williams, BA, Pertea, G, Mortazavi, A, Kwan, G, van Baren, MJ, Salzberg, SL, Wold, BJ, Pachter, L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol.. 2010; 28: 511-515 [OpenAIRE] [PubMed]

Mezlini, AM, Smith, EJ, Fiume, M, Buske, O, Savich, GL, Shah, S, Aparicio, S, Chiang, DY, Goldenberg, A, Brudno, M. iReckon: simultaneous isoform discovery and abundance estimation from RNA-seq data. Genome Res.. 2013; 23: 519-529 [OpenAIRE] [PubMed]

Hu, M, Zhu, Y, Taylor, JM, Liu, JS, Qin, ZS. Using Poisson mixed-effects model to quantify transcript-level gene expression in RNA-Seq. Bioinformatics. 2012; 28: 63-68 [OpenAIRE] [PubMed]

Li, J, Jiang, H, Wong, WH. Modeling non-uniformity in short-read rates in RNA-Seq data. Genome Biol.. 2010; 11: R50 [OpenAIRE] [PubMed]

Li, B, Ruotti, V, Stewart, RM, Thomson, JA, Dewey, CN. RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics. 2010; 26: 493-500 [OpenAIRE] [PubMed]

Li, B, Dewey, CN. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics. 2011; 12: 323 [OpenAIRE] [PubMed]

Wu, Z, Wang, X, Zhang, X. Using non-uniform read distribution models to improve isoform expression inference in RNA-Seq. Bioinformatics. 2011; 27: 502-508 [PubMed]

Wan, L, Yan, X, Chen, T, Sun, F. Modeling RNA degradation for RNA-Seq with applications. Biostatistics. 2012; 13: 734-747 [OpenAIRE] [PubMed]

31 references, page 1 of 3
Abstract
Correctly estimating isoform-specific gene expression is important for understanding complicated biological mechanisms and for mapping disease susceptibility genes. However, estimating isoform-specific gene expression is challenging because various biases present in RNA-Seq (RNA sequencing) data complicate the analysis, and if not appropriately corrected, can affect isoform expression estimation and downstream analysis. In this article, we present PennSeq, a statistical method that allows each isoform to have its own non-uniform read distribution. Instead of making parametric assumptions, we give adequate weight to the underlying data by the use of a non-paramet...
Subjects
free text keywords: Methods Online, Genetics, Data set, Gene, Parametric statistics, RNA-Seq, Biology, Molecular biology, Gene expression, Gene expression profiling, RNA Isoforms, Gene isoform
Funded by
NIH| Developing Statistical Methods for Disease Gene Discovery
Project
  • Funder: National Institutes of Health (NIH)
  • Project Code: 1R01HG004517-01A1
  • Funding stream: NATIONAL HUMAN GENOME RESEARCH INSTITUTE
,
NIH| Statistical Methods for Next-Generation Sequence Data
Project
  • Funder: National Institutes of Health (NIH)
  • Project Code: 3R01GM097505-04S1
  • Funding stream: NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES
,
NIH| Glycomics of Heart and Lung Disease in the Genomic Era
Project
  • Funder: National Institutes of Health (NIH)
  • Project Code: 1U01HL108636-01
  • Funding stream: NATIONAL HEART, LUNG, AND BLOOD INSTITUTE
,
NIH| Mentored Patient Oriented Research in Cardiometabolic Disease
Project
  • Funder: National Institutes of Health (NIH)
  • Project Code: 5K24HL107643-02
  • Funding stream: NATIONAL HEART, LUNG, AND BLOOD INSTITUTE
,
NIH| Translational Studies of ADAMTS7 a Novel GWAS Locus for Coronary Atherosclerosis
Project
  • Funder: National Institutes of Health (NIH)
  • Project Code: 5R01HL111694-04
  • Funding stream: NATIONAL HEART, LUNG, AND BLOOD INSTITUTE
31 references, page 1 of 3

Schena, M, Shalon, D, Davis, RW, Brown, PO. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 1995; 270: 467-470 [OpenAIRE] [PubMed]

Marioni, JC, Mason, CE, Mane, SM, Stephens, M, Gilad, Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res.. 2008; 18: 1509-1517 [OpenAIRE] [PubMed]

Wang, Z, Gerstein, M, Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet.. 2009; 10: 57-63 [OpenAIRE] [PubMed]

Wang, ET, Sandberg, R, Luo, S, Khrebtukova, I, Zhang, L, Mayr, C, Kingsmore, SF, Schroth, GP, Burge, CB. Alternative isoform regulation in human tissue transcriptomes. Nature. 2008; 456: 470-476 [OpenAIRE] [PubMed]

Zhao, K, Lu, ZX, Park, JW, Zhou, Q, Xing, Y. GLiMMPS: Robust statistical model for regulatory variation of alternative splicing using RNA-Seq data. Genome Biol.. 2013; 14: R74 [OpenAIRE] [PubMed]

Sun, W. A statistical framework for eQTL mapping using RNA-seq data. Biometrics. 2012; 68: 1-11 [OpenAIRE] [PubMed]

Jiang, H, Wong, WH. Statistical inferences for isoform expression in RNA-Seq. Bioinformatics. 2009; 25: 1026-1032 [OpenAIRE] [PubMed]

Trapnell, C, Williams, BA, Pertea, G, Mortazavi, A, Kwan, G, van Baren, MJ, Salzberg, SL, Wold, BJ, Pachter, L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol.. 2010; 28: 511-515 [OpenAIRE] [PubMed]

Mezlini, AM, Smith, EJ, Fiume, M, Buske, O, Savich, GL, Shah, S, Aparicio, S, Chiang, DY, Goldenberg, A, Brudno, M. iReckon: simultaneous isoform discovery and abundance estimation from RNA-seq data. Genome Res.. 2013; 23: 519-529 [OpenAIRE] [PubMed]

Hu, M, Zhu, Y, Taylor, JM, Liu, JS, Qin, ZS. Using Poisson mixed-effects model to quantify transcript-level gene expression in RNA-Seq. Bioinformatics. 2012; 28: 63-68 [OpenAIRE] [PubMed]

Li, J, Jiang, H, Wong, WH. Modeling non-uniformity in short-read rates in RNA-Seq data. Genome Biol.. 2010; 11: R50 [OpenAIRE] [PubMed]

Li, B, Ruotti, V, Stewart, RM, Thomson, JA, Dewey, CN. RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics. 2010; 26: 493-500 [OpenAIRE] [PubMed]

Li, B, Dewey, CN. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics. 2011; 12: 323 [OpenAIRE] [PubMed]

Wu, Z, Wang, X, Zhang, X. Using non-uniform read distribution models to improve isoform expression inference in RNA-Seq. Bioinformatics. 2011; 27: 502-508 [PubMed]

Wan, L, Yan, X, Chen, T, Sun, F. Modeling RNA degradation for RNA-Seq with applications. Biostatistics. 2012; 13: 734-747 [OpenAIRE] [PubMed]

31 references, page 1 of 3
Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue