
AbstractMotivationThe length of the 3′ untranslated region (3′ UTR) of an mRNA is essential for many biological activities such as mRNA stability, sub-cellular localization, protein translation, protein binding and translation efficiency. Moreover, correlation between diseases and the shortening (or lengthening) of 3′ UTRs has been reported in the literature. This length is largely determined by the polyadenylation cleavage site in the mRNA. As alternative polyadenylation (APA) sites are common in mammalian genes, several tools have been published recently for detecting APA sites from RNA-Seq data or performing shortening/lengthening analysis. These tools consider either up to only two APA sites in a gene or only APA sites that occur in the last exon of a gene, although a gene may generally have more than two APA sites and an APA site may sometimes occur before the last exon. Furthermore, the tools are unable to integrate the analysis of shortening/lengthening events with APA site detection.ResultsWe propose a new tool, called TAPAS, for detecting novel APA sites from RNA-Seq data. It can deal with more than two APA sites in a gene as well as APA sites that occur before the last exon. The tool is based on an existing method for finding change points in time series data, but some filtration techniques are also adopted to remove change points that are likely false APA sites. It is then extended to identify APA sites that are expressed differently between two biological samples and genes that contain 3′ UTRs with shortening/lengthening events. Our extensive experiments on simulated and real RNA-Seq data demonstrate that TAPAS outperforms the existing tools for APA site detection or shortening/lengthening analysis significantly.Availability and implementationhttps://github.com/arefeen/TAPASSupplementary informationSupplementary data are available at Bioinformatics online.
570, Bioinformatics, 3102 Bioinformatics and Computational Biology (for-2020), Messenger, Bioinformatics and Computational Biology, 08 Information and Computing Sciences (for), Polyadenylation, 3105 Genetics (for-2020), Mathematical Sciences, Eukaryota (mesh), Information and Computing Sciences, 31 Biological sciences (for-2020), Genetics, Animals (mesh), Animals, Humans, Software (mesh), 46 Information and computing sciences (for-2020), RNA, Messenger, Polyadenylation (mesh), 3' Untranslated Regions, Messenger (mesh), Humans (mesh), 31 Biological Sciences (for-2020), Genetics (rcdc), Sequence Analysis, RNA, Human Genome, Eukaryota, Biological Sciences, Human Genome (rcdc), 49 Mathematical sciences (for-2020), 3' Untranslated Regions (mesh), 004, 06 Biological Sciences (for), Bioinformatics (science-metrix), 01 Mathematical Sciences (for), RNA, RNA (mesh), Sequence Analysis, Software
570, Bioinformatics, 3102 Bioinformatics and Computational Biology (for-2020), Messenger, Bioinformatics and Computational Biology, 08 Information and Computing Sciences (for), Polyadenylation, 3105 Genetics (for-2020), Mathematical Sciences, Eukaryota (mesh), Information and Computing Sciences, 31 Biological sciences (for-2020), Genetics, Animals (mesh), Animals, Humans, Software (mesh), 46 Information and computing sciences (for-2020), RNA, Messenger, Polyadenylation (mesh), 3' Untranslated Regions, Messenger (mesh), Humans (mesh), 31 Biological Sciences (for-2020), Genetics (rcdc), Sequence Analysis, RNA, Human Genome, Eukaryota, Biological Sciences, Human Genome (rcdc), 49 Mathematical sciences (for-2020), 3' Untranslated Regions (mesh), 004, 06 Biological Sciences (for), Bioinformatics (science-metrix), 01 Mathematical Sciences (for), RNA, RNA (mesh), Sequence Analysis, Software
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 66 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 1% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
