1000 Genomes

The genomes of any two people are more than 99% identical, but seemingly minor variations between genomes contribute to each individual’s risk for developing disease. By tracing the inheritance of these genetic variants, scientists can discover previously unsuspected genes as key players in the underlying disease process.

A recently completed catalog of common genetic variation, known as the HapMap, has revolutionized human genetic studies, making it possible for scientists to systematically test common DNA variants for an association to disease risk. This resource was genome-wide — that is, allowed testing of all genes — and has made possible discovery in the last few years of over 200 new genes that contribute to conditions such as diabetes, elevated cholesterol levels, heart attack, rheumatoid arthritis, lupus, Crohn’s disease, bipolar disease, and many others.

While successful, HapMap had a major limitation. It contained only the most common genetic variants, those with frequencies above 5% or so, and encompassed only single nucleotide polymorphisms (SNPs), the simplest form of change to DNA. Motivated by the remarkable success of studies based on this incomplete resource, researchers now recognize the need for an even higher resolution genetic map: one that includes rare, variants with frequencies below 5%, as well as other forms of human genetic variation.

Based on the successful model of international collaboration used in the Human Genome Project and HapMap Project, a new project has been created — the 1000 Genomes Project. This international effort was launched in 2008 with support from the National Institutes of Health, the Wellcome Trust Sanger Institute, the Bejiing Genomics Institute, and in-kind contributions by a number of private companies. The project’s goal is to sequence the genomes of at least 1,000 people, discovering both SNPs and structural variants, and to place them in a public database. Creating this unprecedented database is only now possible because of advances in next-generation technologies for sequencing DNA. The map will capture variations in the human genome’s sequence and structure at the highest resolution yet.

The Broad Institute is one of several sequencing centers involved in this effort, which is led by scientists at various academic and research institutions including Program in Medical and Population Genetics director David Altshuler, who co-chairs the overall project, and Genetic Analysis Platform director Stacey Gabriel, who leads the project’s data production group. Importantly, the project draws upon the work of multiple groups within the Broad Institute, especially researchers in the Genome Sequencing and Genetic Analysis Platforms, the Genome Sequencing and Analysis Program, and the Program in Medical and Population Genetics.

In the project’s first phase, the consortium will conduct three pilot projects: sequencing two nuclear families at deep coverage, sequencing the genomes of 180 people at low coverage, and sequencing the protein-coding regions of 1,000 genes in 1,000 people. In the project’s second phase, the genomes of at least 1,000 people will be sequenced at a rate of more than 2 genomes every 24 hours. (To help put this in context, the Human Genome Project took many years to sequence a single human genome.) To expedite disease research by scientists worldwide, all data will be made publicly available without restriction on use.