Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Article . 2025
License: CC BY
Data sources: ZENODO
ZENODO
Article . 2025
License: CC BY
Data sources: Datacite
ZENODO
Article . 2025
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

Haplotype-Resolved Chromosome-scale Assembly of the Bighead Catfish (Clarias macrocephalus) Genome

Authors: ANDRES, Quentin Ludovic Stephane; Singchat, Worapong; Srikulnath, Kornsorn;

Haplotype-Resolved Chromosome-scale Assembly of the Bighead Catfish (Clarias macrocephalus) Genome

Abstract

Haplotype-Resolved Chromosome-Scale Genome Assembly of the Thai Bighead Catfish (Clarias macrocephalus) This study presents the first high-quality, chromosome-scale, haplotype-resolved genome assembly of the Bighead catfish (Clarias macrocephalus), a freshwater species native to Thailand and the Mekong River basin. As a species of economic and ecological importance, C. macrocephalus plays a key role in Southeast Asian aquaculture and conservation efforts. The assembly was generated using a combination of third-generation sequencing technologies, including PacBio HiFi, Oxford Nanopore (ONT), Hi-C, and Illumina paired-end sequencing. The resulting haplotype-resolved diploid genome spans 880 Mb across 27 pseudo-chromosomes, exhibiting high contiguity (N50 = 35.4 Mb), completeness (BUSCO = 95.5%, K-mers-Merqury-k21 = 96,6%), and base-level accuracy (QV50, corresponding to 99.999% correctness). The genome was manually curated and scaffolded using Hi-C chromatin conformation capture data, providing a comprehensive reference for future research. This assembly fills a critical gap in genomic resources for the Clarias genus, offering valuable insights into structural variations, genetic diversity, and the effects of selective breeding of C. macrocephalus. The dataset supports applications in comparative genomics, conservation, aquaculture breeding programs, and pan-genome graph construction. Furthermore, it enables research into adaptive traits, such as the species’ benthic lifestyle and facultative air-breathing capability, which allow survival in low-oxygen environments. Aligned with the United Nations’ Sustainable Development Goal (SDG) 2 (Zero Hunger), this genomic resource contributes to sustainable aquaculture and biodiversity conservation. All sequencing data, genome assemblies, and computational workflows are publicly available under NCBI BioProject number PRJNA1121957, supporting further research in fish genomics, hybridization studies, and genome evolution. All datasets and computational workflows are openly accessible to support further research in fish genomics and hybrid genome analysis. 📂 Data Records 🐟 Genome Assembly of Thai Bighead Catfish (isolate: CMAM) – Bighead catfish (TaxID: 35657) 📜 Raw Sequenced Reads (NCBI SRA)🔬 Nanopore (20% err.): 🔗 SRR29723575 (SRR29723575) 🧪 HiFi: 🔗 SRR29723576 (SRR29723576) 🖥️ Illumina 150PE: 🔗 SRR29723578 (SRR29723578) 🧲 Hi-C 150PE: 🔗 SRR29723577 (SRR29723577) 🗂️ The assembled genome, deposited as a whole-genome sequence (WGS) diploid assembly. 🐠 Haplotype 1 | 🐟 Haplotype 2. 🧬 GenBank accession numbers: 🔗 JBLWMO000000000 (JBLWMO000000000) | 🔗 JBLWMP000000000 (JBLWMP000000000). DATA DESCRIPTION (Final Assemblies (usable): Step Description Tool Library Type Assembly File Name (Output) File Name Suffix (Output) FINAL AND LATEST (NCBI-Submitted) 🐠 Haplotype 1 Hifiasm + GreenHill + JBAT+TGS-GapCloser + Polishing + Manual Curation HiC + UL + HiFi + PE150 Fully phased manually reviewed haplotype 2 fClaMac_1_1.0.fa (NCBI name: Bighead_catfish_fClaMac_hap1_MT.fasta) 🔗 JBLWMO000000000 (JBLWMO000000000) FINAL AND LATEST (NCBI-Submitted) 🐟 Haplotype 2 Hifiasm + GreenHill + JBAT+TGS-GapCloser + Polishing + Manual Curation HiC + UL + HiFi+ PE150 Fully phased manually reviewed haplotype 2 fClaMac_2_1.0.fa (NCBI name: Bighead_catfish_fClaMac_hap2.fasta) 🔗 JBLWMP000000000 (JBLWMP000000000) FINAL AND LATEST 🐠🐟Collapsed Assembly (Mixed) Flye HiFi Collapsed diploid assembly CMAM_FLYE_assembly.fasta .assembly.fa 📌 Data records are hosted under NCBI BioProject number: 🔗 PRJNA1132508 (WGS), PRJNA1159889 (Hap1), PRJNA1159890 (Hap2) 📌 Bighead Catfish BioSample accession number: 🔗 SAMN42347118 (SAMN42347118) Other assemblies (Intermediate Files): Step Description Tool Library Type Assembly File Name (Output) File Name Suffix (Output) Primary Initial Assemblies 1 Haplotype 1 Hifiasm HiC + UL + HiFi Fully phased haplotype 1 CMA.asm.hic.hap1.p_ctg.fa .hic.hap1.p_ctg.fa 1 Haplotype 2 Hifiasm HiC + UL + HiFi Fully phased haplotype 2 CMA.asm.hic.hap2.p_ctg.fa .hic.hap2.p_ctg.fa Scaffolding and Intermediate Assemblies (Hifiasm and GreenHill) 1 Scaffolds Hifiasm HiC + UL + HiFi Primary scaffolding CMA.asm.hic.p_ctg.fa .hic.p_ctg.fa 1 Scaffolds Hifiasm HiC + UL + HiFi Processed unitigs CMA.asm.hic.p_utg.fa .hic.p_utg.fa 1 Scaffolds Hifiasm HiC + UL + HiFi Raw unitigs CMA.asm.hic.r_utg.fa .hic.r_utg.fa 1 Scaffolds L0 Hifiasm HiC + UL + HiFi L0 Phased haplotype 1 CMA_HIC_UL_l0.asm.hic.hap1.p_ctg.fa .hic.hap1.p_ctg.fa 1 Scaffolds L0 Hifiasm HiC + UL + HiFi L0 Phased haplotype 2 CMA_HIC_UL_l0.asm.hic.hap2.p_ctg.fa .hic.hap2.p_ctg.fa 1 Scaffolds L0 Hifiasm HiC + UL + HiFi L0 Primary contigs CMA_HIC_UL_l0.asm.hic.p_ctg.fa .hic.p_ctg.fa 1 Scaffolds L0 Hifiasm HiC + UL + HiFi L0 Alternate contigs CMA_HIC_UL_l0.asm.hic.a_ctg.fa .hic.a_ctg.fa 1 Scaffolds L0 Hifiasm HiC + UL + HiFi L0 Raw unitigs CMA_HIC_UL_l0.asm.hic.r_utg.fa .hic.r_utg.fa 1 Scaffolds L0 Hifiasm HiC + UL + HiFi L0 Processed unitigs CMA_HIC_UL_l0.asm.hic.p_utg.fa .hic.p_utg.fa 1 Scaffolds Hifiasm HiFi + UL Primary contigs CMA_HIFI.asm.p_ctg.fa .p_ctg.fa 1 Scaffolds Hifiasm HiFi + UL Alternate contigs CMA_HIFI.asm.a_ctg.fa .a_ctg.fa 1 Scaffolds Hifiasm HiFi + UL Raw unitigs CMA_HIFI.asm.r_utg.fa .r_utg.fa 1 Scaffolds Hifiasm HiFi + UL Processed unitigs CMA_HIFI.asm.p_utg.fa .p_utg.fa 1 Scaffolds L0 Hifiasm HiFi Primary contigs L0 CMA_HIFI_l0.asm.p_ctg.fa .p_ctg.fa 1 Scaffolds L0 Hifiasm HiFi Alternate contigs L0 CMA_HIFI_l0.asm.a_ctg.fa .a_ctg.fa 1 Scaffolds L0 Hifiasm HiFi Raw unitigs CMA_HIFI_l0.asm.r_utg.fa .r_utg.fa 1 Scaffolds L0 Hifiasm HiFi Polished unitigs containing Hap1 and Hap2 CMA_HIFI_l0.asm.p_utg.fa .p_utg.fa 2 Scaffolds GreenHill Hap1 Hifiasm hap1 phased & scaffolded with GreenHill 02-CMA_HAP1.greenhill.fa NA 2 Scaffolds GreenHill Hap2 Hifiasm hap2 phased & scaffolded with GreenHill 02-CMA_HAP2.greenhill.fa NA Failed Assemblies (Wtdbg2 - Not Used) 1 Assembly 1 Wtdbg2 (failed low QV) HiFi raw Consensus contigs CM_M_dbg.hifi.raw.fa .raw.fa 1 Assembly 1 Wtdbg2 (failed low QV) HiFi ONT raw Consensus contigs CM_M_dbg.cb.raw.fa .raw.fa 1 Assembly 1 Wtdbg2 (failed low QV) HiFi cns Consensus contigs CM_M_dbg.hifi_cns.fa .cns.fa 1 Assembly 1 Wtdbg2 (failed low QV) HiFi ONT cns Consensus contigs CM_M_dbg.cb_cns.fa .cns.fa 1 Consensus Assembly Wtdbg2 (failed low QV) HiFi Polished consensus CM_M_dbg.hifi.srp.fa .srp.fa 1 Consensus Assembly Wtdbg2 (failed low QV) HiFi ONT Polished consensus CM_M_dbg.cb.srp.fa .srp.fa * L0 means that there was no purging of false duplication errors (i.e., the assembly is expected to be of longer size..). Technical validation (To be done.): Step Description Tool Library Type Assembly File Name (Output) File Name Suffix (Output) . Knowledge Dissemination: Object Description Link / File Manuscript Presentation and Interpretation of Results. (Version 1.0). Bighead_catfish_C_macrocephalus_MS_draft_ver_1.pdf Figure 1 Sequencing Data Summary for C. macrocephalus Genome Experiment. Figure_1_SEQUENCING_READS_AND_GENOMESCOPE2.0.png Figure 2 Comprehensive Haplotype-Resolved Genome Assembly and Scaffolding Workflow. Figure_2_GENOME_ASSEMBLY_WORKFLOW.png Figure 3 Hi-C Contact Matrix Heat Maps of Individual Pseudo-chromosome in Haplotype 1. Figure_3_SEPARATE_HIC_MAPS_HAPLOTYPE_1_all.pdf Figure 4 Hi-C Contact Matrix Heat Maps of Individual Pseudo-chromosome in Haplotype 2. Figure_4_SEPARATE_HIC_MAPS_HAPLOTYPE_2_all.pdf Figure 5 Hi-C map of Hi-C Scaffolds - Bighead Catfish. Figure_5_GENOME_WIDE_HIC_MAPS_HAPLOTYPE_1_AND_2.png Figure 6 Assembly Status Displaying Gaps and Telomeres, January 2024 - November 2025. Figure_6_BIGHEAD_CATFISH_MANUAL_CURATION_PROGRESS_HAP1_HAP2.png Figure 7 Visual Genome Quality, Merqury Spectra and BUSCO Scores. Figure_7_MERQURY_K-MER_ANALYSIS_AND_BUSCO.png Figure 8 Synteny Analysis of Linkage Groups for Various Catfish Assemblies. Figure_8_SYNTENIC_RELATIONSHIPS_TILAPIA_BIGHEAD_ZEBRAFISH.png Table 1 Summary Statistics of the Genome Assembly and Transposable Element Content. Table_1_GENOME_SURVEY_GENOME_SUMMARY_AND_TRANSPOSABLB_ELEMENT_CONTENT.xlsx Table 2 Summary of Individual Scaffold Metrics in the Haplotype-resolved Assembly. Table_2_BIGHEAD_CATFISH_SUMMARY_STATISTICS_PER_SCAFFOLD_QV_S-AQI-PCT.xlsx Table S1 Additional Statistics of Additional Assemblies Table_S1_SUMMARY_STATISTICS_OF_BIGHEAD_CATFISH_ASSEMBLIES_BUSCO_CRAQ_MERQURY.xlsx Table S2 List of Software and Their Versions Table_S2_LIST_OF_TOOLS_USED_FOR_BIGHEAD_CATFISH_ASSEMBLY.xlsx Figure S1 Assembly Graph (GFA), Hifiasm Primary Phased Contigs, Visualized in Bandage. Supplementary_Figure_1_BANDAGE_ASSEMBLY_GRAPH_HIFIASM_HiC_UL_P_CTGS_HAP1_HAP2.png Figure S2 mtDNA Alignments in 210 Siluriformes Species Including Bighead Catfish. Supplementary_Figure_2_mt_DNA_210_SPECIES_COMPARISON.png Overleaf Project A .zip Containing The Manuscript and all Figures and Tables, Including Technical Validation Files. Bighead_catfish_C_macrocephalus_MS_draft_ver.1.zip

Related Organizations
Keywords

siluriformes, catfish, genome assembly, bighead catfish

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    1
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
1
Average
Average
Average
Green
Related to Research communities
Italian National Biodiversity Future Center