extHomFam 2: large-scale benchmark for protein multiple sequence alignments

extHomFam 2 was constructed by combining Homstrad reference alignments (March 2020) with Pfam 33.1 complete families (NCBI variant). Homstrad entries with less than 3 reference sequences and those pointing to dead Pfam families were discarded. The resulting benchmark was divided into subsets depending on the family size N: subset N range # families small [200, 10 000) 86 medium [10 000, 40 000) 95 large [40 000, 100 000) 83 xlarge [100 000, 250 000) 67 huge [250 000, 3 000 000) 62 The directories in the archive correspond to the names of the subsets, while the reference alignments are located in 'ref' folder.

{"references": ["Deorowicz, S., Debudaj-Grabysz, A. & Gudy\u015b, A. FAMSA: Fast and accurate multiple sequence alignment of huge protein families. Sci Rep 6, 33964 (2016). https://doi.org/10.1038/srep33964"]}

Related Organizations

Silesian University of Technology
Poland

Keywords

multiple sequence alignment, protein families, benchmark, pfam, homstrad

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average