Optimising intrinsic disorder prediction for short linear motif discovery

Short linear motifs (SLiMs) are short protein regions, commonly only 3 - 10 amino acids in length, that are directly involved in protein-protein interactions. Identification of SLiMs is important for understanding fundamental processes involved in normal cellular function. SLiMs interact with their partnering proteins with low affinity. This makes them difficult to identify experimentally; as a result, many computational SLiM prediction methods have been developed. Because SLiMs typically have only a few defined positions, random non-functional sequences that matches a SLiM sequence pattern are ubiquitous in any proteome. The main challenge in computational SLiM prediction is to identify the true positive SLiMs (“signal”) amongst the much more abundant false positive motif matches (“noise”). To increase the signal to noise ratio, different sequence masking techniques are applied to attempt to screen out protein regions that are unlikely to contain real SLiMs and thereby preferentially eliminating only random non-functional sequence matches from the data. SLiMs are typically found in regions of intrinsic disorder, hence a widely implemented masking strategy is to use predictors of intrinsic protein disorder to identify and remove protein regions that form stable three-dimensional structures. However, to date, there has been no systematic study on how best to predict intrinsic disorder for SLiM discovery. In this study, I investigate the relative performance of the ten disorder prediction methods implemented in the MobiDB database, along with the functional disorder predictor ANCHOR. The SLiM prediction program SLiMProb was used to predict instances of known SLiMs across the human proteome, and SLiMFinder was used to predict novel SLiM patterns. The benchmarking program SLiMBench was used to evaluate the performance of the different input masking strategies based on disorder predictors, and to identify the optimal settings for SLiM occurrence prediction and for de novo SLiM prediction. This study shows that while all disorder prediction methods improve both SLiM occurrence prediction and de novo SLiM prediction, they do so with varying quality. Additionally, regional smoothing of disorder predictions prior to masking was found to further improve SLiM discovery. These results will be useful for guiding future SLiM discovery efforts.

Country

Australia

Related Organizations

UNSW Sydney
Australia

Keywords

Protein sequence analysis, 610, Big data analysis

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green