descriptionPublicationkeyboard_double_arrow_right Article , Other literature type 30 Jul 2014 English Publisher:WileyJournal:Angewandte Chemie International Edition, volume 53, pages 10,864-10,866 (issn: 1433-7851, eissn: 1521-3773,

Authors: Javier Mu�oz; Albert J. R. Heck;

doi: 10.1002/anie.201406545

pmid: 25079383

From the Human Genome to the Human Proteome

- Summary
- Subjects
- Metrics

Abstract

An ultimate goal in biology is to fully understand how a cell, or even a whole organism, works. Ideally, such knowledge might be used to develop models that predict the responses of cells to specific cues or diseases. A first step towards this goal is to identify and characterize all the molecular players present in cells. The Human Genome Project was probably one of the most ambitious scientific endeavors so far and provided the first essential pieces in this puzzle. The availability of the 3 billion base pairs that make up our DNA generated worldwide excitement as this information might lead to understanding the molecular mechanisms of human pathologies. However, soon thereafter the complexity embedded in our genetic code was already realized. A surprising finding was the low percentage of DNA (less than 2% of the genome) coding for proteins—roughly 20.000 human genes. However, recent analysis indicates that 80% of the human genome is functional and either transcribed, binding to regulatory proteins, or associated with other biochemical functions. Although genomic information is vital, it does not touch upon proteins, the main molecular effectors of cells. Every researcher will agree that the analysis of the proteome is of more relevance, but still this has been less exploited due to technical hurdles and by the fact that the proteome is inherently several magnitudes more complex. Whereas the genome is nearly identical in every cell of the human body and also relatively constant over the lifetime of an organism, the proteome of every cell is very different and changes dramatically over time (Figure 1). Notwithstanding these challenges, the field of proteomics has witnessed tremendous developments over the last decade, primarily through advances in mass spectrometry and bioinformatics, and is now somewhat coming up to par with genomics and transcriptomics technologies. This is evidenced by two recent reports in Nature from a German team led by Bernard K ster and a USA/India-based collaboration headed by Akhilesh Pandey, who independently initiated an unprecedented effort with the aim of identifying all the human proteins encoded in the genome. To this end, both laboratories performed extensive proteomic analyses onmore than 70 human tissues and body fluids and more than 150 cell lines. Although the two teams used a very similar MS-centric workflow, some differences exist between these two studies, especially in the depth of the analyses. While Pandey et al. performed around 2000 mass spectrometric (LC-MS) runs, K ster et al. carried out more than 6000 analyses and made use of another 10000 measurements publicly available in proteomic repositories. Assuming an average of two hours per run, the instrument time used to acquire these data would reach an astonishing number of 34000 h (4.3 years if only one mass spectrometer had been used). The analysis of all the data resulted in the identification of 946000 and 293000 nonredundant unique peptide sequences in K ster s and Pandey s studies, respectively. Strikingly, and despite the significant difference in depth, the two studies found evidence for a nearly identical number of protein-coding genes: 18097 (K ster) and 17294 (Pandey). Although a careful comparison of the two studies is still needed, a first conclusion can be drawn: the unequivocal existence of protein translation for 90–95% of the human genes. This is a highly relevant finding, as previously almost one-third of the human genes had been barely annotated, and there was no experimental evidence that they could lead to proteins. Another relevant discovery derived from these studies concerns the extent of alternative splicing in the generation of protein isoforms. It is clear that the number of genes does not correlate with the complexity of an organism (C. elegans for instance has 20500 genes) and it has been suggested that alternative splicing might increase the repertoire of functional proteins. However, these proteomic studies could only identify as many as 9000 of the 67000 isoforms annotated in Uniprot. Although some of these isoforms may produce only one unique peptide, decreasing the likelihood of observation by proteomics, these data could also support the idea that there is a dominant isoform per gene. Both studies confirmed the existence of a core proteome present in all tissues, made up of “housekeeping [*] Prof. Dr. A. J. R. Heck Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, Utrecht University Padualaan 8. 3584 CH Utrecht (The Netherlands) E-mail: a.j.r.heck@uu.nl

Related Organizations

Spanish National Cancer Research Centre
Spain
Utrecht University
Netherlands

Keywords

Proteomics, Proteome, Genome, Human, Sequence Analysis, RNA, High-Throughput Nucleotide Sequencing, Sequence Analysis, DNA, Tandem Mass Spectrometry, Humans, Chromatography, High Pressure Liquid

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	44
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%