
Contents (11 databases) Database Version Models / Sequences Description KofamScan 2026-04-11 27,576 KEGG Orthology HMM profiles with adaptive thresholds Pfam-A 2026-04-11 27,481 Protein domain families with gathering thresholds TIGRFAM 15.0 4,488 Functionally equivalent protein families (equivalogs) dbCAN V14 875 CAZyme family + curator-subfamily HMM profiles dbCAN_sub 2025 53,411 eCAMI subfamily HMMs (k-mer-clustered CAZymes); EC-anchored substrate inference where curated HMSS2 2026-04-11 363 Sulfur metabolism HMM profiles (inorganic + organic) CAMPER 2026-04-11 289 Polyphenol metabolism HMM profiles TCDB 2026-04-11 24,281 Transporter Classification Database (DIAMOND) VFDB 2026-04-11 4,538 Virulence Factor Database core dataset (DIAMOND) AMRFinderPlus 2026-03-24.1 — NCBI antimicrobial resistance gene database DefenseFinder 2.0.1 — Anti-phage defense system models (MacSyFinder) Changes from v1.1 Added dbCAN_sub — dbCAN3's eCAMI subfamily database (53,411 k-mer-clustered HMMs, ~5.1 GB on disk). Provides finer subfamily classification and EC-anchored substrate predictions for CAZymes whose cluster members have curated EC numbers. dbCAN parsing now emits the full domain architecture — proteins carrying multiple CAZy domains (e.g. CBM + GH) emit all domains in N→C order via cath-resolve-hits, replacing the single-best-hit behavior used previously. Same change applies to dbCAN_sub. The other 10 databases are unchanged from v1.1. Usage Download and extract: meta-pipeline-funcanno db update --version 1.2 Or manually: tar xzf meta-pipeline-FuncAnno-db-v1.2.tar.gz Point the pipeline to the database: meta-pipeline-funcanno annotate \ -i orfanno_results/ \ -o results/ \ --db-dir /path/to/meta-pipeline-FuncAnno-db-v1.2/ The extracted directory can be shared across users on an HPC system — only one install is needed per lab/group. Run subsets selectively, including the new dbCAN_sub: meta-pipeline-funcanno annotate ... --steps dbcan,dbcan_sub Chunking Large HMM databases are pre-split for parallel hmmsearch on HPC. Chunk counts are sized per database (more chunks for larger HMM sets): Database Chunks KofamScan 640 Pfam-A 640 dbCAN_sub 640 TIGRFAM 128 dbCAN 128 HMSS2, CAMPER small, unchunked Use meta-pipeline-funcanno db rechunk --cores N to re-split for a different core count. Licensing KofamScan (KEGG) profiles are for academic use only — see https://www.kegg.jp/kegg/legal.html.All other databases are freely available for academic use.
