HapMix: A tool for finding genetic diversity

When researchers from the ӳý and the Department of Human Genetics at Harvard University set about the task of pinpointing ancestral diversity in African Americans, the first tool they used for the job was the HapMix software engine. HapMix is a software tool that helps researchers infer...

When researchers from the ӳý and the Department of Human Genetics at Harvard University set about the task of pinpointing ancestral diversity in African Americans, the first tool they used for the job was the . HapMix is a software tool that helps researchers infer the ancestry of extremely small bits of DNA. “It is a method for reconstructing the mosaic of African and European ancestry that is present in each African-American,” explains David Reich, associate member of the ӳý and assistant professor at Harvard Medical School's Department of Genetics and one of the co-writers of the HapMix program. “It is used for determining the ancestral parentage of an African-American person’s genome.” Within an African-American genome, any particular portion may be derived from two European segments, two African segments, or be a mixed segment of both.

HapMix works by identifying genetic variations that are of very different frequencies between Africans and Europeans. There is actually very little overall genetic differentiation between human populations. At any one site, in fact, there may be zero difference. But when examining 1,000 points right next to each other, evidence piles up about ancestry.

HapMix statistically combines the information for many sites that are next to each other, each of which by itself provide very weak evidence about ancestry, but when combined provides very strong inference about whether it is of African or European ancestry. HapMix was created by David, Simon Myers, a former senior post-doctoral fellow at the ӳý now at the University of Oxford, along with ӳý colleagues Alkes Price, Nick Patterson, and Arti Tandon. The development team published the open source tool in a 2009 PLoS Genetics and it has since become widely used globally for genetic variation studies, including this week’s published in Nature on the ancestral landscape of African-Americans. (Read the ӳý news story .)

Thanks to tools like HapMix, scientists can home in on extremely small differences in the DNA of different people known as single nucleotide polymorphisms (SNPs) and identify their ancestral origin. “If you look at just one area, you can’t come up with a clear result,” David explains. “But ultimately combining the probabilities from millions of different DNA differences, one can obtain very strong evidence of ancestry.” The type of approach they used to do this, known as a Hidden Markov Model (HMM), is a standard way of combining data from a lot of weak evidence at neighboring sites to provide stronger evidence from the combination.