
This dataset contains 1,860 simulated multiple sequence alignments (MSAs) with known phylogenies. The alignments were generated using the AliSim tool from IQ-TREE v2.4.0. The simulation parameters span a broad range of conditions: Number of sequences: 1,000–100,000 Substitution models: LG, JTT, and WAG Sequence lengths: 400–2,000 residues Sequence identities: 8%–75% Gap fractions: 0%–99% Directory structure The dataset contains three main directories: fasta/ – unaligned protein sequences [FASTA format] msa/ – aligned protein sequences (reference MSAs) [FASTA format] tree/ – phylogenetic trees corresponding to each simulated MSA (reference trees) [Newick format] A metadata file (`metadata.tsv`) is also included, providing detailed information for each simulated MSA Metadata A metadata file (metadata.tsv) is included, containing detailed information for each simulated MSA. It provides: id – unique MSA identifier seqs_count – number of sequences in the MSA alisim_length – seed sequence length alisim_rlen_min / mean / max – relative branch length parameters alisim_ins / alisim_del – insertion and deletion rates alisim_model – substitution model (LG, JTT, WAG) alisim_model_type – model configuration used by AliSim mean_identity_percent – average sequence identity [%] mean_gaps_percent – average fraction of gaps [%] min_seq_length / mean_seq_length / max_seq_length – sequence length statistics
benchmark, multiple sequence alignment, protein sequences
benchmark, multiple sequence alignment, protein sequences
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
