The genome of the human cytomegalovirus (HCMV) was first sequenced over 20 years ago, but researchers have now discovered that the common pathogen has a surprisingly complex protein-coding capacity as well.
The virus only causes birth defects in newborn infants and disease in adults with compromised immune systems, but it infects most humans on the planet. The new findings, published in the 23 November issue of Science, could help explain why HCMV is such a widespread and successful virus.
Noam Stern-Ginossar from the University of California in San Francisco (UCSF), along with colleagues from the United States and Germany, used a combination of techniques to study HCMV’s proteome—its complete set of expressed proteins. This deeper look at HCMV, they said, provides the first steps toward understanding how this virus and others hijack human cells during infection.
The HCMV pathogen has one of the largest viral genomes on record, with a massive 240,000 base pairs of DNA. For comparison, the genome of the poliovirus only contains about 7500 base pairs.
“The genome of a virus is just a starting point,” explained UCSF researcher Jonathan Weissman, a co-author of the Science report. “Understanding what proteins are encoded by that genome allows us to start thinking about what the virus does and how we can interfere with it…Each of the proteins we’ve identified has the potential to tell us how this virus is manipulating its host cell.”
Stern-Ginossar and the other researchers suspected that existing maps of HCMV’s protein-coding potential, based largely on computational methods, were far from complete. They began mapping the positions of ribosomes—the cellular organelles in which proteins are synthesized—during an HCMV infection of human fibroblast cells.
With the resulting map, they discovered templates for hundreds of previously unidentified proteins that were encoded in corresponding DNA segments of the viral genome known as open reading frames.
The researchers also applied mass spectrometry, a technique that can be used to determine the chemical composition of molecules, to confirm the presence of many unknown viral proteins that had been predicted by mapping the ribosome positions.
Surprisingly, the researchers found that many of these open reading frames encode for exceptionally short protein sequences (fewer than 100 amino acids). And some of the newly identified open reading frames were even hiding inside other open reading frames, they said.
“A key finding of our work is that each of these templates can encode more than one protein,” said Annette Michalski from the Max Planck Institute of Biochemistry in Martinsried, Germany. “And these extremely short proteins might be more common than we expect.”
Links
Read the abstract, “Decoding Human Cytomegalovirus,” by Noam Stern-Ginossar and colleagues.
Read a news release about the research in Arabic.