Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2023
License: CC 0
Data sources: ZENODO
DRYAD
Dataset . 2023
License: CC 0
Data sources: Datacite
versions View all 2 versions
addClaim

Whole genome assembly and annotation of the King Angelfish (Holacanthus passer) gives insight into the evolution of marine fishes of the Tropical Eastern Pacific

Authors: Gatins, Remy; Arias, Carlos F.; Sánchez, Carlos; Bernardi, Giacomo; De León, Luis F.;

Whole genome assembly and annotation of the King Angelfish (Holacanthus passer) gives insight into the evolution of marine fishes of the Tropical Eastern Pacific

Abstract

# Whole genome assembly and annotation of the King Angelfish (*Holacanthus passer*) gives insight into the evolution of marine fishes of the Tropical Eastern Pacific by Remy Gatins, Carlos F. Arias, Carlos Sánchez, Giacomo Bernardi, and Luis F. De León corresponding author: remygatinsa@gmail.com *** ## Genome Assembly Files * HPA\_1.1.fasta - fasta genome assembly file ## Genome Annotation Files * HPA\_1.1\_annotation.gff - gff genome annotation file * HPA\_1.1\_nucleotide\_proteins.fasta - coding gene nucleotide sequences in fasta file format * HPA\_1.1\_predicted\_proteins.fasta - predicted proteins in fasta file format * gemoma.job - sbatch job submission to run Gemoma pipeline * gemoma\_HPA\_1.1.log - output log file from gemoma.job submission * protocol\_GeMoMaPipeline.txt - Gemoma pipeline parameter description ### RepeatMasker files * HPA\_04\_pilonb.fasta.cat.gz - alignment file * HPA\_04\_pilonb.fasta.masked - fasta file showing repeats across the genome * HPA\_04\_pilonb.fasta.out - annotation file with the cross\_match output lines * HPA\_04\_pilonb.fasta.tbl - repeatmasker summary file * repeatmasker.log - HPA\_repeatmasker log file * repeatmasker.job - repeatmasker job sumission ## BUSCO output files * missing\_busco\_list\_HPA\_1.1\_busco\_actinopterygii.txt - missing BUSCO list using Actinopterygii\_odb9 database * missing\_busco\_list\_HPA\_1.1\_busco\_eukaryota.txt - missing BUSCO list using Eukaryota\_odb9 database * short\_summary\_HPA\_1.1\_busco\_actinopterygii.txt - BUSCO short summary using Actinopterygii\_odb9 database * short\_summary\_HPA\_1.1\_busco\_eukaryota.txt - BUSCO short summary using Eukaryota\_odb9 database * full\_table\_HPA\_1.1\_busco\_actinopterygii.tsv - BUSCO full output table using Actinopterygii\_odb9 database * full\_table\_HPA\_1.1\_busco\_eukaryota.tsv - BUSCO full output table using Eukaryota\_odb9 database ## Extra files * HPA\_1.1\_blobtools.zip - this zipped folder includes about 40 `.json `files that are used all together as input to generate the blob tools viewer.

Holacanthus angelfishes are some of the most iconic marine fishes of the Tropical Eastern Pacific (TEP). However, very limited genomic resources currently exist for the genus. In this study we: i) assembled and annotated the nuclear genome of the King Angelfish (Holacanthus passer), and ii) examined the demographic history of H. passer in the TEP. We generated 43.8 Gb of ONT and 97.3 Gb Illumina reads representing 75X and 167X coverage, respectively. The final genome assembly size was 583 Mb with a contig N50 of 5.7 Mb, which captured 97.5% complete Actinoterygii Benchmarking Universal Single-Copy Orthologs (BUSCOs). Repetitive elements account for 5.09% of the genome, and 33,889 protein-coding genes were predicted, of which 22,984 have been functionally annotated. Our demographic model suggests that population expansions of H. passer occurred prior to the last glacial maximum (LGM) and were more likely shaped by events associated with the closure of the Isthmus of Panama. This result is surprising, given that most rapid population expansions in both freshwater and marine organisms have been reported to occur globally after the LGM. Overall, this annotated genome assembly will serve as a resource to improve our understanding of the evolution of Holacanthus angelfishes while facilitating novel research into local adaptation, speciation, and introgression in marine fishes.

To annotate our genome, we used the homology-based gene prediction pipeline GeMoMa (v1.6.4). GeMoMa uses protein-coding gene models and intron position conservation from reference genomes to predict possible protein-coding genes in a target genome (Keilwagen et al., 2018). Here, we ran the GeMoMa pipeline using annotations from three fish species: Amphiprion ocellaris, Oreocromis niloticus, Electrophorus electricus (downloaded from NCBI, see Table S3). These species were selected to represent a variety of genes from close to distant high-quality fish annotations. In our particular case, the pipeline performed four main steps: 1) Extractor or external search, using the search algorithm tbalstn with cds parts as queries from our reference genomes, 2) Gene Model Mapper (GeMoMa), which builds gene models from the extractor results, 3) GeMoMa Annotation Filter (GAF) that filters and combines common gene predictions and 4) AnnotationFinalizer, which predicts UTRs for annotated coding sequences and generate genes and transcripts names (Keilwagen et al., 2018). Additionally, repetitive elements were predicted by running RepeatMasker (open-4.0.6, Smit et al. 2013–2015) with the Teleostei database to identify repetitive elements in the genome and soft-mask the assembly. RepeatMasker.out was converted to GFF with RepeatMasker script `rmOutToGFF3.pl`.

Keywords

whole genome assembly, Oxford Nanopore, hybrid genome assembly, FOS: Biological sciences, genome assembly, Pomacanthidae, Tropical Eastern Pacific, Holacanthus passer, Illumina short-read sequencing, Long read sequencing

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    1
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
1
Average
Average
Average