News: News Archives
Chinese Researchers Sequence Indica Rice Genome
The rice strain, indica, sequenced by Jun Yu of the Beijing Genomics Institute and the University of Washington Genome Center, with colleagues at 11 Chinese institutions, is a major subspecies in China and other Asian-Pacific regions. Crossing the indica strain with another variety produces a super-hybrid with a 20- to 30-percent higher yield per hectare than other rice crops.
The draft sequence for indica rice contains 466 million base pairs--3.7 times larger than the only other sequenced plant genome, the mustard plant, Arabidopsis,but 6.7 times smaller than the human genome.
How does the rice genome compare with the human genome? The indica genome contains 45,000-56,000 genes, and the average length of each gene is 4,500 base pairs long. The number of human genes is still being debated, but may be around 30,000 to 40,000, with an average gene length of 72,000 base pairs. Arabidopsis, includes an estimated 25,498 genes, with an average gene length around 2,000 base pairs. Differences in gene length may signal different mechanisms for generating protein diversity: The indica genome (like the Arabidopsis genome) shows signs of extensive gene duplication, with more than 70% of the genes duplicated.
Duplication of smaller genes may produce the protein diversity needed for adaptive evolution in plants, Yu's team suggests. Vertebrate animals, like humans, may generate diverse proteins through processes such as gene splicing that break up and reassemble relatively larger genes into new combinations. Some 1.7 percent of the indica genome consists of simple sequence repeats, and complex sequence repeats make up another one percent. Simple repeats involve just a few base pairs, and can be useful "markers," or points of reference along the genome.
Complex repeats, or "transposable elements," are DNA sequences that hop around the genome. While most transposons in the human genome are found within the introns, or non-coding portion of genes, most transposons in the two plant genomes are located between genes, researchers noted.
To sequence the indica genome, Yu and colleagues used the same "whole genome shotgun method," previously used to sequence the fruit fly genome, and by private researchers sequencing the human genome.
Yu's team generated many DNA snippets of known length from all over the rice genome. The amount of snippets, lined up according to the regions where their DNA sequences overlapped, was enough to cover the genome roughly four times. The researchers then determined the base pair sequence for each snippet, and used a computer program to assemble them into longer segments. These segments (called "contigs," since they refer to genomic regions where contiguous DNA sequences overlap) were then ordered and assembled into 103,044 larger components called "scaffolds."
The researchers searched for genes within the indica genome by directly comparing the rice sequences to known gene sequences deposited in public databases, and from gene-prediction software programs. They also used software programs to classify the rice genes by general functional categories, such as metabolism, cellular communication, and cell growth regulation.
To confirm accuracy, Yu's group gathered all publicly available rice gene sequences and rice gene markers, and searched for those sequences within the indica draft. Their findings suggest that the indica genome draft covers 92 percent of the whole rice genome.
In a second stage of research, the team will produce a more detailed sequence, to be integrated with physical and genetic maps of the rice genome. The more detailed sequence should reveal any gaps in the current draft that may contain genes, and place all the genes into functional categories.
Comparing Rice and Arabidopsis
Yu's comparison of the indica and Arabidopsis genomes revealed some similarities between the two plant genomes, compared to the human genome (such as gene duplication). But, the analysis also revealed interesting differences between these two plants, representing the two major types of seed-bearing plants, monocots and dicots. In the most striking comparison, 80.6 percent of Arabidopsis genes are found in rice, but only 49.4 percent of the indica genes are found in Arabidopsis.
This asymmetry could suggest that the rice genome is a "superset" of the Arabidopsis genome, the result of a massive gene duplication event, and may shed light on how monocots and dicots evolved and diverged some 200 million years ago.
-- Kathleen Wren