
AbstractThe interaction between proteins and DNA is a key driving force in a significant number of biological processes such as transcriptional regulation, repair, recombination, splicing, and DNA modification. The identification of DNA-binding sites and the specificity of target proteins in binding to these regions are two important steps in understanding the mechanisms of these biological activities. A number of high-throughput technologies have recently emerged that try to quantify the affinity between proteins and DNA motifs. Despite their success, these technologies have their own limitations and fall short in precise characterization of motifs, and as a result, require further downstream analysis to extract useful and interpretable information from a haystack of noisy and inaccurate data. Here we propose MotifMark, a new algorithm based on graph theory and machine learning, that can find binding sites on candidate probes and rank their specificity in regard to the underlying transcription factor. We developed a pipeline to analyze experimental data derived from compact universal protein binding microarrays and benchmarked it against two of the most accurate motif search methods. Our results indicate that MotifMark can be a viable alternative technique for prediction of motif from protein binding microarrays and possibly other related high-throughput techniques.
Genomics (q-bio.GN), FOS: Computer and information sciences, Computer Science - Machine Learning, Binding Sites, Base Sequence, Amino Acid Motifs, DNA, Sequence Analysis, DNA, Quantitative Biology - Quantitative Methods, Machine Learning (cs.LG), FOS: Biological sciences, Quantitative Biology - Genomics, Algorithms, Quantitative Methods (q-bio.QM), Transcription Factors
Genomics (q-bio.GN), FOS: Computer and information sciences, Computer Science - Machine Learning, Binding Sites, Base Sequence, Amino Acid Motifs, DNA, Sequence Analysis, DNA, Quantitative Biology - Quantitative Methods, Machine Learning (cs.LG), FOS: Biological sciences, Quantitative Biology - Genomics, Algorithms, Quantitative Methods (q-bio.QM), Transcription Factors
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 5 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
