Powered by OpenAIRE graph
Found an issue? Give us feedback
ZENODOarrow_drop_down
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

simulated_msa v1.0: simulated multiple sequence alignments with known phylogenies for benchmarking MSA tools

Authors: Zielezinski, Andrzej; Gudyś, Adam; Deorowicz, Sebastian;

simulated_msa v1.0: simulated multiple sequence alignments with known phylogenies for benchmarking MSA tools

Abstract

This dataset contains 1,860 simulated multiple sequence alignments (MSAs) with known phylogenies. The alignments were generated using the AliSim tool from IQ-TREE v2.4.0. The simulation parameters span a broad range of conditions: Number of sequences: 1,000–100,000 Substitution models: LG, JTT, and WAG Sequence lengths: 400–2,000 residues Sequence identities: 8%–75% Gap fractions: 0%–99% Directory structure The dataset contains three main directories: fasta/ – unaligned protein sequences [FASTA format] msa/ – aligned protein sequences (reference MSAs) [FASTA format] tree/ – phylogenetic trees corresponding to each simulated MSA (reference trees) [Newick format] A metadata file (`metadata.tsv`) is also included, providing detailed information for each simulated MSA Metadata A metadata file (metadata.tsv) is included, containing detailed information for each simulated MSA. It provides: id – unique MSA identifier seqs_count – number of sequences in the MSA alisim_length – seed sequence length alisim_rlen_min / mean / max – relative branch length parameters alisim_ins / alisim_del – insertion and deletion rates alisim_model – substitution model (LG, JTT, WAG) alisim_model_type – model configuration used by AliSim mean_identity_percent – average sequence identity [%] mean_gaps_percent – average fraction of gaps [%] min_seq_length / mean_seq_length / max_seq_length – sequence length statistics

Keywords

benchmark, multiple sequence alignment, protein sequences

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average