Using structural motifs to identify proteins with DNA binding function

Article English OPEN
Jones, Susan ; Barker, Jonathan A ; Nobeli, Irene ; Thornton, Janet M (2003)

This work describes a method for predicting DNA binding function from structure using 3-dimensional templates. Proteins that bind DNA using small contiguous helix¿turn¿helix (HTH) motifs comprise a significant number of all DNA-binding proteins. A structural template library of seven HTH motifs has been created from non-homologous DNA-binding proteins in the Protein Data Bank. The templates were used to scan complete protein structures using an algorithm that calculated the root mean squared deviation (rmsd) for the optimal superposition of each template on each structure, based on Ca backbone coordinates. Distributions of rmsd values for known HTH-containing proteins (true hits) and non-HTH proteins (false hits) were calculated. A threshold value of 1.6 Å rmsd was selected that gave a true hit rate of 88.4% and a false positive rate of 0.7%. The false positive rate was further reduced to 0.5% by introducing an accessible surface area threshold value of 990 Å2 per HTH motif. The template library and the validated thresholds were used to make predictions for target proteins from a structural genomics project.
  • References (31)
    31 references, page 1 of 4

    1. Berman,H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., Weissig,H., Shindyalov,I.N. and Bourne,P.E. (2000) The Protein Data Bank. Nucleic Acids Res., 28, 276±280.

    2. Brennan,R.G. and Matthews,B.W. (1989) The helix-turn-helix DNAbinding motif. J. Biol. Chem., 264, 1903±1906.

    3. Beamer,L.J. (1992) Re®ned 1.8 angstrom crystal-structure of the lambdarepressor operator complex. J. Mol. Biol., 227, 20.

    4. Luscombe,N.M. and Thornton,J.M. (2002) Protein-DNA interactions: amino acid conservation and the effects of mutations on binding speci®city. J. Mol. Biol., 320, 991±1009.

    5. Orengo,C.A., Michie,A.D., Jones,S., Jones,D.T., Swindells,M.B. and Thornton,J.M. (1997) CATH ± a hierarchic classi®cation of protein domain structures. Structure, 5, 1093±1108.

    6. Bateman,A., Birney,E., Cerruti,L., Durbin,R., Etwiller,E., Eddy,S.R., Grif®ths-Jones,S., Howe,K.L., Marshall,M. and Sonnhammer,E.L.L. (2002) The Pfam protein families database. Nucleic Acids Res., 30, 276±280.

    7. Hughey,R. and Krogh,A. (1996) Hidden Markov models for sequence analysis: extension and analysis of the basic method. Comput. Appl. Biosci., 12, 95±107.

    8. Schultz,J., Milpetz,F., Bork,P. and Ponting,C.P. (1998) SMART, a simple modular architecture research tool: identi®cation of signalling domains. Proc. Natl Acad. Sci. USA, 95, 5857±5864.

    9. Altschul,S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J. (1990) Basic local alignmnet search tool. J. Mol. Biol., 215, 403±410.

    10. Thompson,J.D., Gibson,T.J., Plewniak,F., Jeanmougin,F. and Higgins,D.G. (1997) The CLUSTAL_X windows interface: ¯exible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res., 25, 4876±4882.

  • Similar Research Results (1)
  • Bioentities (5)
    1lmb Protein Data Bank
    1mkm Protein Data Bank
    1smt Protein Data Bank
    1taq Protein Data Bank
    1tau Protein Data Bank
  • Metrics
    No metrics available
Share - Bookmark