Charted islands

A small percentage of genetic sequences couldn't be located within the human genome. Researchers got to the bottom of the mystery using math and some creative problem-solving.

Message in a bottle
Message in a bottle

Consider the classic "message in a bottle" scenario: a man is stranded on an uncharted landmass far from civilization. Thanks to his watertight missive, he is able to let others know that he's out there — that the landmass on which he is marooned exists somewhere in the world. However, without any reference points, his exact location remains a mystery.

Geneticists have been mulling over a similar problem since the completion of the Human Genome Project (HGP): they were aware of a small percentage of genetic sequences that simply couldn't be located within the human genome. The sequences were identified by the HGP and other gene sequencing efforts, so researchers knew of their existence, but they couldn’t tell where, in the genome, they were hiding.

It was often thought that localizing these unmapped genetic sequences would require a new technology that could more effectively scour the genome and read chromosomes from end to end. Instead, a team led by Broad associate member Steve McCarroll and computational biologist Giulio Genovese got to the bottom of the mystery using math and some clever problem-solving. By using statistics to track the genetic sequences back to their ancestral sources, they were able to locate many of these “missing” pieces: most were found hiding within a tightly packed and relatively quiet swath of DNA called heterochromatin. The researchers describe the newly-found sequences as islands of biologically active DNA lying within “heterochromatic oceans."

Genovese devised a way to trace these sequences using “admixture mapping,” a method previously used to locate genes associated with disease. When populations are separated for long periods of time (in the case of humans, tens of thousands of years), the chromosomes of each population acquire enough differences that they can be distinguished from each other using statistical methods. When those populations start to mix, their “admixed” offspring carry mosaic chromosomes — a patchwork of segments from those two distinct sets of ancestors.

Genovese reasoned that, since pieces of the genome that are close together tend to be inherited together, it’s possible to map the location of a gene or sequence by tracing it back through generations to its source in one of the ancestral populations. So, much as one might try to locate the source of that “message in a bottle” by determining how far and in what direction the current may have carried the bottle, one could use admixture data to determine where genetic sequences may have originated.

"Around 2004 or 2005," Genovese explained, "people started to use this methodology to locate disease genes. I thought we could instead use this machinery to map the difficult one percent of sequences that had not been located."

The team generated and analyzed genetic data from 380 African Americans who participated in the Jackson Heart Study. Since most African Americans have both European and African ancestors, they could use the mosaic patterns in their genomes to map the genomic locations of the missing sequences.

“By determining whether the so-called ‘missing pieces’ of sequence came from their European ancestors or their African ancestors, we could sort of dock those pieces into place by matching their ancestry patterns to ancestry patterns in the known, mapped regions of their genomes — thus determining where in the genome these ‘missing pieces’ were located,” McCarroll explained.

The team was somewhat surprised to find so many of the sequences hidden inside the heterochromatin, an area that, due to its density and repetitive pattern of genetic code, had been difficult to reach in the Human Genome Project.

“I think people had tended to think that that part of the genome might be devoid of the kinds of complex sequences, genes, and functional biology that we associate with the rest of the genome,” McCarroll said. “What our findings showed is that these regions have expressed genes hidden inside them.”

McCarroll, who is also the director of genetics at the Broad's Stanley Center for Psychiatric Research and a professor in genetics at Harvard Medical School, described the team’s work as a “completely new and powerful approach” that will help expand existing maps of the human genome.

"Geneticists use these maps every day,” McCarroll said. “It's exciting that Giulio's approach can help to complete those resources."

The team reported their findings earlier this week in Nature Genetics.

 

Other researchers involved in this work include Robert Handsaker, Heng Li, Nicolas Altemose, Amelia Lindgren, Kimberly Chambert, Bogdan Pasaniuc, Alkes Price, David Reich, Cynthia Morton, James G. Wilson, and Martin Pollak. Collaborators included researchers from Harvard Medical School, Beth Israel Deaconess Medical Center, Brigham and Women’s Hospital, Harvard School of Public Health, and the University of Mississippi Medical Center.

Paper cited:

Genovese, et al. Using population admixture to help complete maps of the human genome. Nature Genetics (2013) doi: 10.1038/ng.2565