
This repository contains data and models developed/generated as part of the project, KSMoFinder. The following are the files and a brief description of their contents. kg_data.zip - (KSMoFinder's knowledge graph) This archive contains three folders - kg_train_data.csv, kg_val_data.csv, kg_train_val.csv. a) kg_train_val.csv - This file contains all the triples constituting the knowledge graph (kg); b) This file contains triples that were used train the KGE models and determine optimal hyperparameters; c) kg_val_data.csv - This file contains validation triples that were used to assess the performance of KGE models and determine optimal training epoch. embeddings.zip - This archive contains embedding data extracted from all the models including the four KGE trained as part of KSMoFinder, and embeddings extracted from external models - ProtT5, ESM2, ESM3, ProstT5, Phosformer and a random embedding model. kg_ks.zip - This archive contains two files a) kinases.csv, a file containing all kg kinases; b) substrates_motif.csv, a file containing all kg substrate_motifs along with their site position, 9-mer and 15-mer motifs. assessments_data.zip - This archive contains classification datasets used for assessment1, assessment2 and assessment3. It contains subfolders with testing datasets with two different ratio of positives:negatives. a) 1:1 ratio and b) distribution same as training dataset. ksf2_predictions.zip - This file contains prediction probabilities generated by KSMoFinder for kinase, substrate, motif data. kge_models_assess1.zip - This file contains the classifier models trained using embeddings from the four KGE models. models_assess2.zip - This file contains classifier models trained using embeddings from the external models - ProtT5, ESM2, and ESM3. models_assess3.zip - This file contains classifier models trained to assess the influence of additional features - kinase domain sequences, 15-mer motifs, protein structure based embeddings. other_model_predictions.zip - This archive contains predictions collected/generated from other kinase-substrate prediction tools, LinkPhinder, PredKinKG, KSFinder, Phosformer-ST. classifier_datasets.zip - This archive contains the classifier dataset used to train the KSMoFinder model. It contains subfolders with testing datasets with two different ratio of positives to negatives. a) 1.1 ratio and b) distribution same as training dataset. model_ksfinder.zip - This file contains the KSMoFinder model. other_datafiles.zip - This file contains kinase-group, kinase-family data, motif and substrate protein of phosphosites.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
