First edition of HapMap released, a "catalog" of human genetic variation

"Not chaos — like together crush'd and bruis'd,
But as the world, harmoniously confus'd,
Where order in variety we see,
And where, though all things differ, all agree."

Alexander Pope, Windsor Forest

The completion of the human genome sequence in 2003, though momentous, was only the first step toward grasping the core mechanisms of human biology and disease. This ultimate biomedical goal also requires a comprehensive catalog of the genetic diversity in the human genome sequence across human populations. A flurry of high-profile scientific papers published this week herald the success of pulling together such a catalog. The manuscripts describe both the content and uses of HapMap, a catalog that maps human genetic variation and relates it both to disease and to human evolutionary history. HapMap gives scientists worldwide a first good look at the "order in variety" that is the human genome.

All these studies are grounded in data presented in a significant paper published in the Oct. 27 issue of the journal Nature by an international consortium of more than 200 researchers from Canada, China, Japan, Nigeria, the United Kingdom and the United States. In this paper, the authors describe the patterns of genetic variation in hundreds of human DNA samples collected from four sites around the world.

Perhaps the most striking finding in this mountain of data is the overwhelming evidence for previous work that suggested that human genetic variants located physically close to each other in the genome are collectively inherited as groups, or "haplotypes." The comprehensive catalog of human genetic variation, now known as the "HapMap", is publicly available to the biomedical research community. The implications — and potential value — of the genome's haplotype structure for medicine has only begun to be realized.

"Built upon the foundation laid by the human genome sequence, the HapMap is a powerful new tool for exploring the root causes of common diseases. We absolutely require such a resource so that we can develop new and much-needed approaches to understand these diseases, such as diabetes, bipolar disorder, cancer and many others, " said David Altshuler, director of the Program in Medical and Population Genetics of the Broad Institute of Harvard and MIT and associate professor of genetics and of medicine at Massachusetts General Hospital and Harvard Medical School. Altshuler and Peter Donnelly, of the University of Oxford in England, are the corresponding authors of the Nature paper.

Diseases run in families, and perhaps half the risk of any given common disease can be explained by genetic differences inherited from one's parents. Inheritance also plays a role in the different responses to a drug or to an environmental factor seen in some people. But the underlying causes of these common diseases and therapeutic responses have been largely unknown, for reasons that include technological limitations to evaluating the range of genetic contributions to disease across many different individuals or populations.

To address this fundamental biomedical research need, a new genomics-based approach to human genetics was proposed nearly a decade ago: to comprehensively catalog common human DNA sequence variations, and to test them systematically for their association to disease. The HapMap is the successful outcome of this proposal.

"The data from the HapMap project allows scientists to select the particular DNA variants that provide the greatest information in the most efficient manner, lowering the costs and increasing the power of genetic research," said Mark Daly, assistant professor in the Center for Human Genetic Research at Massachusetts General Hospital, and an associate member of the Broad Institute of Harvard and MIT. Daly led the Boston team's statistical and analytical work and was a member of the writing group for the Nature paper.

To understand the power and elegance of the HapMap, it is important to recognize that its roots are set down not only in the completion of the human genome sequence in 2001, but also in the massive effort to characterize and catalog the millions of individual DNA base variations (single nucleotide polymorphisms or SNPs) across the genome in the human population. Based on the initial SNP and sequence data, the haplotype structure of the human genome was recognized as early as 2001, leading directly to the formation of the International HapMap Consortium. Broad Institute scientists led or contributed significantly to all of these efforts, in addition to their role in the completion of the HapMap and demonstrations of its utility.

The HapMap project, as is true of many revolutionary scientific projects, also spurred remarkable advances in the technology for testing genetic variations in DNA, making it possible to undertake comprehensive studies in large numbers of patient samples. Stacey Gabriel, director of the Broad Institute's Genetic Analysis Platform and an author on the Nature paper noted that "when we started doing this work several years ago, determining the genotype of a single SNP in a patient cost nearly a dollar, and we could do hundreds a day. Today the prices have dropped in many cases to a fraction of a penny per genotype, and we can do millions a day. This is the difference between not being able to do the studies, and getting them done rapidly and well."

One concern faced by the HapMap Consortium was that "sampling" variation across the genome may reduce the certainty in linking a region or gene to a specific disease. To that end, Paul de Bakker, Roman Yalensky and their colleagues demonstrate that the HapMap in fact provides excellent power to capture most human variation and link it to disease or other traits. In a paper published in the November issue of Nature Genetics, they describe a method to select "tag SNPs" that capture the genetic variation in each haplotype with a minimum amount of work. Using these tags, the scientists can then compare the SNP patterns of people affected by a disease with those unaffected far more efficiently than has previously been possible. "Compared to directly genotyping all common SNPs in the genome in all individuals of a disease study, we observe that selected tag SNPs based on HapMap can save genotyping costs by almost an order of magnitude — without losing much power to detect a true association," says de Bakker, a postdoctoral fellow in Altshuler and Daly's group at Massachusetts General Hospital and the Broad Institute. The widely used tool for tag SNP selection was developed by de Bakker and colleagues and is available at http:/www.broad.mit.edu/mpg/tagger/.

The availability of rich "real world" data in HapMap has also led to the realization that previous computer models of human genetics are simply too limited, and can even lead to false conclusions about the role of genes or genetic loci in human evolution. In a paper published in the November issue of Genome Research, Stephen Schaffner, Altshuler and their colleagues at the Broad Institute not only demonstrate the limitations of these prior models, but they also provide updated models for the use of the entire scientific community that more closely approximate the reality of human genetic variation as seen in the HapMap catalog. "Better computer models are invaluable tools in understanding the nature of human DNA variation, past changes in human populations size, and evolutionary selection," said Schaffner, a computational biologist in Broad's Program in Medical and Population Genetics.

Although much of the interest in HapMap focuses on disease genetics, its data are equally powerful in uncovering potential sites of natural selection in the human genome. Pardis Sabeti, Eric Lander and their colleagues at the Broad Institute together with Stephen O'Brien and his colleagues at the National Cancer Institute used the HapMap to re-examine earlier work on natural selection on CCR5-Δ32, a genetic variation in a T-cell receptor that confers strong resistance to infection by HIV and that has been implicated in resistance to the bubonic plague. "With the benefit of greater genotyping and empirical comparisons from the HapMap, we were able to show that the pattern of genetic variation seen at CCR5-Δ32 does not stand out as exceptional relative to other loci across the genome and is consistent with neutral evolution," says Sabeti, a student at Harvard Medical School and a postdoctoral fellow at the Broad Institute. "In fact, the CCR5-Δ32 allele is likely to have arisen more than 5000 years ago, rather than during the last 1000 years as was previously thought." They report their findings in the November issue of PLoS Biology, and show that the HapMap also gives scientists unprecedented ability to identify novel candidates for natural selection.

In October 2002, the International HapMap Consortium set the ambitious goal of creating the HapMap within three years. The Nature paper marks the successful attainment of that goal with its detailed description of the Phase I HapMap, consisting of more than 1 million SNPs. The Consortium is also nearing completion of the Phase II HapMap that will contain nearly three times more SNPs than the initial version and will enable researchers to focus their gene searches even more precisely on specific regions of the genome.

In line with the international nature of this work, and with Broad Institute's commitment to building critical tools and resources for the entire biomedical community, HapMap data are freely available in several public databases, including the HapMap Data Coordination Center, the NIH-funded National Center for Biotechnology Information's dbSNP and the JSNP Database in Japan.

The HapMap data have been added to these databases as they have become available in the course of the project, and have already proven their utility: Over 70 posters and talks at the American Society for Human Genetics meeting this week in Salt Lake City describe findings based on analyses of HapMap data.

Papers cited:

Sabeti PC, Walsh E, Schaffner S, Varilly P, Fry B, Hutcheson H, Cullen M,Mikkelsen T, Roy J, Patterson N, Cooper R, Reich D, Altshuler D, O'Brien S, Lander E. The case for selection at CCR5-Δ32. PLoS Biology. 2005;3(11): e378. DOI:10.1371/journal.pbio.0030378

De Bakker PIW, Yelkensky R, Pe'er I, Gabriel SB, Daly MJ, Altshuler D. Efficiency and power in genetic association studies. Nature Genetics Advance Online Publication 23 Oct 2005 DOI:10.1038/ng1669

Schaffner SF, Foo C, Gabriel S, Reich D, Daly MJ, Altshuler D. Calibrating a coalescent simulation of human genome sequence variation. Genome Research. 2005;15(11): 1576-1583. DOI:10.1101/gr.3709305

The International HapMap Consortium. A haplotype map of the human genome. Nature. 2005;437(7063): 1299-1320. DOI:10.1038/nature04226