
ABSTRACTGene set enrichment analysis (GSEA) aims at identifying essential pathways, or more generally, sets of biologically related genes that are involved in complex human diseases. In the past, many studies have shown that GSEA is a very useful bioinformatics tool that plays critical roles in the innovation of disease prevention and intervention strategies. Despite its tremendous success, it is striking that conclusions of GSEA drawn from isolated studies are often sparse, and different studies may lead to inconsistent and sometimes contradictory results. Further, in the wake of next generation sequencing technologies, it has been made possible to measure genome‐wide isoform‐specific expression levels, calling for innovations that can utilize the unprecedented resolution. Currently, enormous amounts of data have been created from various RNA‐seq experiments. All these give rise to a pressing need for developing integrative methods that allow for explicit utilization of isoform‐specific expression, to combine multiple enrichment studies, in order to enhance the power, reproducibility, and interpretability of the analysis. We develop and evaluate integrative GSEA methods, based on two‐stage procedures, which, for the first time, allow statistically efficient use of isoform‐specific expression from multiple RNA‐seq experiments. Through simulation and real data analysis, we show that our methods can greatly improve the performance in identifying essential gene sets compared to existing methods that can only use gene‐level expression.
Models, Genetic, Reproducibility of Results, Breast Neoplasms, Gene Expression Regulation, ROC Curve, Databases, Genetic, Humans, Protein Isoforms, Computer Simulation, Female, Algorithms
Models, Genetic, Reproducibility of Results, Breast Neoplasms, Gene Expression Regulation, ROC Curve, Databases, Genetic, Humans, Protein Isoforms, Computer Simulation, Female, Algorithms
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 7 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
