Powered by OpenAIRE graph
Found an issue? Give us feedback
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Using Proteomics to Mine Genome Sequences

Authors: Jonathan W, Arthur; Marc R, Wilkins;

Using Proteomics to Mine Genome Sequences

Abstract

We present a method for mining unannotated or annotated genome sequences with proteomic data to identify open reading frames. The region of a genome coding for a protein sequence is identified by using information from the analysis of proteins and peptides with MALDI-TOF mass spectrometry. The raw genome sequence or any unassembled contigs of an organism are theoretically cleaved into a number of equal sized but overlapping fragments, and these are then translated in all six frames into a series of virtual proteins. Each virtual protein is then subjected to a theoretical enzymatic digestion. Standard proteomic sample preparation methods are used to separate, array, and digest the proteins of interest to peptides. The masses of the resulting peptides are measured using mass spectrometry and compared to the theoretical peptide masses of the virtual proteins. The region of the genome responsible for coding for a particular protein can then be identified when there are a large number of hits between peptides from the protein and peptides from the virtual protein. The method makes no assumptions about the location of a protein in a particular gene sequence or the positions or types of start and stop codons. To illustrate this approach, all 773 proteins of Pseudomonas aeruginosa contained in SWISS-PROT were used to theoretically test the method and optimize parameters. Increasing the size of the virtual proteins results in an overall improvement in the ability to detect the coding region, at the cost of decreasing the sensitivity of the method for smaller proteins. Increasing the minimum number of matching peptides, lowering the mass error tolerance, or increasing the signal-to-noise ratio of the simulated mass spectrum, improves the ability to detect coding regions. The method is further demonstrated on experimental data from Mycobacterium tuberculosis and is also shown to work with eukaryotic organisms (e.g., Homo sapiens).

Keywords

Genome, Base Sequence, Proteome, Molecular Sequence Data, Mycobacterium tuberculosis, Open Reading Frames, Bacterial Proteins, Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization, Pseudomonas aeruginosa, Animals, Humans, Computer Simulation, Amino Acid Sequence, Databases, Nucleic Acid

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    27
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Top 10%
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Top 10%
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
27
Average
Top 10%
Top 10%
Upload OA version
Are you the author of this publication? Upload your Open Access version to Zenodo!
It’s fast and easy, just two clicks!