Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2021
License: CC 0
Data sources: ZENODO
DRYAD
Dataset . 2021
License: CC 0
Data sources: Datacite
versions View all 2 versions
addClaim

Data from: Transposable element annotation in non-model species - on the benefits of species specific repeat libraries using semi-automated EDTA and DeepTE de novo pipelines

Authors: Bell, Ellen; Butler, Christopher; Taylor, Martin;

Data from: Transposable element annotation in non-model species - on the benefits of species specific repeat libraries using semi-automated EDTA and DeepTE de novo pipelines

Abstract

Please note that the Repeat Masker output files are raw and unparsed. To parse data as in the manuscript please use the parse script published here: https://github.com/clbutler/RM_TRIPS File List: DanioLib_DeepTE_clean.fasta -> The RepBase Danio library which has been run through the DeepTE program for TE classification Lin1C115wtgdb_EDTADeepTE_cleanLib_V1.1.fasta -> The de novo transposible element library we produced for the Corydoras sp. C115 genome using EDTA and then DeepTE Horizontal_transfer_Analysis_script.R -> The R script used for the horizontal transfer of transposible elements analysis Unparsed_DanioDeepTElib_CmaculiferGenome_AssemblyNameCM_19_scafSeq.fas.out -> Unparsed Repeat Masker output using DanioLib_DeepTE_clean.fasta as the repeat library and the Corydoras maculifer genome (available on genbank) Unparsed_DanioDeepTElib_CmaculiferSample56_transcriptome.out -> Unparsed Repeat Masker output using DanioLib_DeepTE_clean.fasta as the repeat library and the Corydoras maculifer transcriptome (available on genbank) Unparsed_DanioDeepTElib_CorydorasC115genome_AssemblyNameLin1PacBio.ctg.fa.r3p3_pilon_3.fasta.out -> Unparsed Repeat Masker output using DanioLib_DeepTE_clean.fasta as the repeat library and the Corydoras sp. C115 genome (available on genbank) Unparsed_DeNovolib_CmaculiferGenome_AssemblyNameCM_19_scafSeq.fas.out -> Unparsed Repeat Masker output using Lin1C115wtgdb_EDTADeepTE_cleanLib_V1.1.fasta as the repeat library and the Corydoras maculifer genome (available on genbank) Unparsed_DeNovolib_CmaculiferSample56_transcriptome.out -> Unparsed Repeat Masker output using Lin1C115wtgdb_EDTADeepTE_cleanLib_V1.1.fasta as the repeat library and the Corydoras maculifer transcriptome (available on genbank) Unparsed_DeNovolib_CorydorasC115genome_AssemblyNameLin1PacBio.ctg.fa.r3p3_pilon_3.fasta.out -> Unparsed Repeat Masker output using Lin1C115wtgdb_EDTADeepTE_cleanLib_V1.1.fasta as the repeat library and the Corydoras sp. C115 genome (available on genbank) Unparsed_DanioDeepTElib_DanioGenome_AccessionNoGCF_000002035.6_GRCz11.out -> Unparsed Repeat Masker output using DanioLib_DeepTE_clean.fasta as the repeat library against the Danio rerio genome (Accession number: GCF_000002035.6_GRCz11)

A ‘de-novo’ TE library was generated for the C. sp. C115 genome using the Extensive de-novo TE Annotator (EDTA) (Ou et al., 2019) set to the ‘others’ species parameter. We utilised the inbuilt RepeatModeller (Smit & Hubley, 2008) support which identifies any remaining TEs which might have been overlooked by the EDTA algorithm (--sensitive 1). Classifications within this library were refined using DeepTE using the predefined metazoan model parameter setting (-m) (Yan et al., 2020). TE identification was performed using RepeatMasker (RM; version 1.332) utilising the NCBI/RMBLAST (version 2.6.0+) search engine. This analysis was conducted either against the D. rerio Repbase (2018-10-26) entry, which was also run through DeepTE (to allow for uniformity in TE classification), or the Corydoras-specific library. RM was run under the most sensitive (-s) parameter setting in all instances. The genomic and transcriptomic RM output files were subsequently parsed through a custom R script which (i) removed non-distinct elements by retaining repeats which had a higher scoring match whose domain partly include the domain of another match, (ii) removed repetitive elements not classed as TEs (e.g. microsatellites, simple repeats & sRNAs), (iii) merged elements found on the same contig if they had the same name, orientation, and their combined sequence length was less than or equal to the corresponding reference sequence in RepBase and (iv) removed merged repeats with a length less than 80 base pairs. Additionally, for transcriptomic data, if multiple identical repeats were found across different transcript isoforms, only one was retained. This was to ensure that each repeat represented a unique genomic locus. This script is publicly available from https://github.com/clbutler/RM_TRIPS." Additional scripts describe a horozontal transfer of transposible elements analysis included in the acompanying manuscript. 

Transposable elements (TEs) are significant genomic components which can be detected either through sequence homology against existing databases or de novo, with the latter potentially reducing underestimates of TE abundance. Here, we describe the semi-automated generation of a de-novo TE library which combines the newly described EDTA pipeline and DeepTE classifier in a non-model teleost (Corydoras sp. C115). We assess performance using both genomic and transcriptomic input by five metrics: (i) abundance (ii) composition (iii) fragmentation (iv) age distributions and (v) capture of potential horizontally transferred TEs. We identified notable differences in these metrics between different TE libraries, and highlight how library choice can have a major impact on TE content estimates in non-model species. This repository incorporates six raw (unparsed) Repeat Masker (RM) output files for two genomes (Corydoras sp. c115 and Corydoras maculifer) one transcriptome (C. maculifer), two Repeat Libraries (one based on the RepBase Danio rerio library and one de novo library build on the C. sp. c115 genome). The RM ouput files correspond to one homology based transposon search using the D. rerio library and one species specific search using the de novo library. It also includes a script to acompany horizontal transfer analysis and a transposable element renamins script.

Related Organizations
Keywords

FOS: Biological sciences

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 7
    download downloads 1
  • 7
    views
    1
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
7
1