CAARS: comparative assembly and annotation of RNA-Seq data

descriptionPublicationkeyboard_double_arrow_right Article 19 Nov 2018 France English Publisher:Oxford University Press (OUP)Journal:Bioinformatics, volume 35, pages 2,199-2,207 (issn: 1367-4803, eissn: 1367-4811,

Copyright policy )Funded by:ANR | IFB (ex Renabi-IFB), ANR | CONVERGENOMIX

Authors: Rey, Carine; Veber, Philippe; Boussau, Bastien; Sémon, Marie;

doi: 10.1093/bioinformatics/bty903

pmid: 30452539

pmc: PMC6596894

CAARS: comparative assembly and annotation of RNA-Seq data

- Summary
- Subjects
- Metrics

Abstract

Abstract Motivation RNA sequencing (RNA-Seq) is a widely used approach to obtain transcript sequences in non-model organisms, notably for performing comparative analyses. However, current bioinformatic pipelines do not take full advantage of pre-existing reference data in related species for improving RNA-Seq assembly, annotation and gene family reconstruction. Results We built an automated pipeline named CAARS to combine novel data from RNA-Seq experiments with existing multi-species gene family alignments. RNA-Seq reads are assembled into transcripts by both de novo and assisted assemblies. Then, CAARS incorporates transcripts into gene families, builds gene alignments and trees and uses phylogenetic information to classify the genes as orthologs and paralogs of existing genes. We used CAARS to assemble and annotate RNA-Seq data in rodents and fishes using distantly related genomes as reference, a difficult case for this kind of analysis. We showed CAARS assemblies are more complete and accurate than those assembled by a standard pipeline consisting of de novo assembly coupled with annotation by sequence similarity on a guide species. In addition to annotated transcripts, CAARS provides gene family alignments and trees, annotated with orthology relationships, directly usable for downstream comparative analyses. Availability and implementation CAARS is implemented in Python and Ocaml and is freely available at https://github.com/carinerey/caars. Supplementary information Supplementary data are available at Bioinformatics online.

Country

France

Related Organizations

View all View all

Keywords

Genome, Sequence Analysis, RNA, [SDV.BID.EVO]Life Sciences [q-bio]/Biodiversity/Populations and Evolution [q-bio.PE], [SDV.BID.EVO] Life Sciences [q-bio]/Biodiversity/Populations and Evolution [q-bio.PE], RNA, Molecular Sequence Annotation, Transcriptome, Original Papers, Phylogeny, Software

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	4
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average