
Overharvesting for ornamental and medicinal purposes, combined with ongoing habitat loss and fragmentation in Vietnam, has severely threatened wild populations of Camellia cucphuongensis. Effective conservation and management of this species, therefore, require robust genomic resources and informative molecular markers to quantify genetic diversity and population structure. In this study, we generated the first transcriptome dataset for C. cucphuongensis and developed expressed sequence tag–simple sequence repeat (EST-SSR) markers using Illumina HiSeq™ 4000 sequencing. A total of 13,600,954 clean reads were obtained (Q20 = 97.55%, Q30 = 93.11%, GC = 44.08%). De novo assembly produced 118,552 unigenes with a mean length of 541.2 bp and an N50 of 683 bp. Functional annotation revealed that 52,107 and 25,640 unigenes had significant matches in the Nr and Swiss-Prot databases, respectively. Additionally, 28,007 unigenes were assigned to Gene Ontology terms, 27,968 to KOG categories and 11,959 to 117 KEGG pathways. Mining for simple sequence repeats identified 9,661 EST-SSR loci. From 60 screened primer pairs, 11 polymorphic EST-SSR markers were validated and applied to 60 individuals from three natural populations. Genetic diversity was moderate (NE = 2.17; PIC = 0.548; HO = 0.46; HE = 0.50), with most variation occurring within individuals (79%) and 11% amongst populations (FST = 0.113; Nm = 1.96). Principal coordinate analysis (PCoA), discriminant analysis of principal components (DAPC), STRUCTURE and neighbour-joining (NJ) analyses all indicated detectable population structuring, with population CP showing clearer differentiation relative to LH and TL. Collectively, these transcriptomic resources and EST-SSR markers provide practical tools for genetic monitoring and can support conservation strategies that emphasise habitat protection and maintenance of connectivity to mitigate genetic erosion in this endangered golden camellia.
