Rice Genome Sequencing Project

Rice (Oriza sativa) is the second most cultivated cereal crop worldwide after maize.  It is the staple food for over half of the world’s population, mainly in Asia, South America and Africa. Rice is an annual crop and grows in a warm and humid climate where plenty of rain/water is available.  Although there are several species of rice, the Asian rice (Oriza sativa) is the most commonly cultivated rice. The Asian rice has two subspecies, one is indica and the other one is japonica. Worldwide, there are over 40,000 varieties of rice.

Rice has the smallest genome size among all cereal crops with its genome size at around 430MB. The next largest genome of any cereal crop is  sorghum which is at around 750Mb. When the genome size of wheat is compared with the rice genome, the wheat genome is over 37 times larger than the rice genome. The human genome is about 8 times larger than the rice genome.  Because of its importance in food supply worldwide and its smaller genome size, scientist selected rice for sequencing as a model for cereal crops. Sequencing of rice genome started in 1998 by International Rice Genome Sequence Project (IRGSP) with the involvement of 10 rice producing countries that included Brazil, Canada, China, France, India, Japan, Korea, Thailand, Vietnam and the USA.  Rice genome project was the last major genome sequencing project that used the Sanger sequencing method.  Before the completion of the rice genome sequencing project under IRGSP, private companies Monsanto and Syngenta published the rice genome sequence in 2000 and 2002, respectively.  It was the first ever genome sequencing done by private companies. IRGSP revealed map-based high quality genome of rice in 2005. The rice genome was the second plant genome sequenced after Arabidopsis.

The analysis of the rice genome sequence data revealed that the size of the rice genome is at 389Mb which is smaller than earlier thought (430Mb). About 38,000 protein coding genes were identified with about 3,000 genes that are unique to rice and other cereal crops. Over 80,000 polymorphic sites that were able to distinguish between the two subspecies of rice, indica and japonica were identified. Other important outcome of the rice genome sequencing project was that it also sequenced the first centromere for any complex eukaryotic species. Centromeres have highly repetitive sequences which are difficult to sequence and thus usually genome sequences have gaps. In rice, the centromere consist of clustered repeats of 59 Kb and 69 Kb long.

The rice genome sequence data available online is being used by scientists around the world. One of the major applications of the data is for molecular breeding to improve the quantity and quality of rice production through identification of genes that control agronomic traits. Scientists hope that genetic marker-assisted breeding of agricultural crops is important to meet the growing demand for food due to human population growth worldwide. In addition to the use of the genome sequence data for crop improvement, it can also provide insights into the evolutionary and domestication history of rice.