
Preprint: ET-miner discovers which combinations of protein features (Pfam domains, Gene Ontology terms, structural properties) co-occur across the AlphaFold protein universe. Using GPU-resident frequent itemset mining, ET-miner exhaustively mines 76.9 million multi-feature proteins in 7.3 minutes on a single NVIDIA H100, revealing 26.8 million co-occurrence patterns up to K=22 features deep. The deepest pattern describes a neuronal antiviral RNA helicase sentinel shared by exactly 8 proteins. This work demonstrates that exact Apriori computation at 100-million-transaction scale is practical on current-generation datacenter GPUs.
FOS: Computer and information sciences, Proteomics, Protein Structure, Bioinformatics, GPU, Computational Biology, CUDA, Apriori, Data Mining/methods, (MeSH) Proteins/chemistry, Co-Occurrence, Frequent itemset mining, Pfam, Alphafold, Algorithms
FOS: Computer and information sciences, Proteomics, Protein Structure, Bioinformatics, GPU, Computational Biology, CUDA, Apriori, Data Mining/methods, (MeSH) Proteins/chemistry, Co-Occurrence, Frequent itemset mining, Pfam, Alphafold, Algorithms
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
