SEARCHING FOR THE MISSING HERITABILITY
Create a blog post subtitle that summarizes your post in a few short, punchy sentences and entices your audience to continue reading.
Molquant tools enable the exploration of additional Parkinson’s Disease candidates through the intersection of Disease Biology Networks and Parkinson’s Disease GWAS PDGene database (http://www.pdgene.org). Molquant, 2014
Genomics research continues to expand ever more rapidly, with new studies linking genes to phenotypes coming out continuously. However, as more studies emerged, researchers began to recognize a problem: the “Missing Heritability.”
We have long been able to estimate the genetic underpinnings (heritability) of a trait by assessing the correlation between the trait and relatedness. If a trait is genetically linked (i.e. if it runs in families), then the more closely related two individuals are, the more likely they will share that trait (Twins>Siblings>Parents>Grandparents>First Cousins and so on). A trait like height has been estimated to have 80% heritability, that is 80% of your potential height is driven by your genes, 20% by diet and other factors. One of the largest GWAS efforts yet conducted looked at height, and they found 180 gene variants that link to height. However, when they add up the contribution of all those genes, they contribute only about ~10% of the known variation in height. Where is the other 70%? (THAT’S the ‘Missing Heritability’!) We know height is inherited; the genes must be in there somewhere, but we haven’t been able to find them. (This explains the limited interpretation abilities of Illumina and 23andMe).
This mystery in genomics has generated many competing hypotheses as to why we haven’t done better at linking genes to traits. It’s still controversial (e.g.1) , but the data are leaning toward the “additive model,” the notion that many genes contribute to any given phenotype, but each gene contributes only a little (2,3; a thoughtful discussion 4) (and a recent ASHG 2013 talk by MIT’s Yaniv Erlich in which his analysis of 44 million entries in public ancestry data support the additive model). When the field of genomics matures, we will have access to genes and phenotypes from millions of people. This would help to surmount the missing heritability problem by sheer numbers (as suggested by this yeast model paper (5). Still, limitations in our analysis assumptions (e.g. how do we decide whether an association is significant), as well as complexity in population structure will additionally create analysis challenges. Can we make significant progress until we sequence and analyze millions of people?
Molquant’s Approach to Genome Scale Data - A New Framework
Taking a somewhat orthogonal approach, we believe that there are real opportunities today to better interpret genome scale data. The dominant statistical approach of correlating millions of individual variables to a ‘yes’ or ‘no’ phenotype carries with it some fundamental limitations. Further, biology doesn’t appear to be organized in a one SNP/one trait manner. Using a “machine” analogy, we know that each gene typically makes one “part” that assembles into a larger machine, where each part needs to be functioning properly for the machine to work optimally. Most biological systems that have been studied conform to this model, where each gene makes a protein that fits into a multi-protein complex (Ribosomes, transcriptional regulation, DNA replication/repair, mitochondrial complexes, etc.)
Molquant plot assessing the relationships among a set of mathematically derived biological networks
If we can develop reasonable algorithms and apply them to the rapidly growing tangle of data, previously unappreciated patterns emerge. Our work so far has demonstrated this: using biologically-seeded statistical approaches Molquant has been able to organize large amounts of genome-scale data into biologically relevant groups or networks. Unlike curated annotation groupings like Gene Ontology (GO) terms, these networks are mathematically-derived and comprise both well-studied genes as well as uncharacterized genes.
These networks provide powerful tools for a number of analytical activities. They enable biological annotation of genes and disease processes, which is especially useful for providing biological hypotheses around disease genes identified through positional cloning or sequencing efforts. Applied to gene::trait associations, these networks provide a novel framework for assessing the role of a particular SNP in a trait. Such a framework makes a biological assumption, that the majority of gene variation associated with a particular trait will not be randomly distributed across the genome, but will be found in the biological networks relevant to that trait. Thus, at least some of the “missing heritability” of GWAS may be found in variants among the networked genes, which contribute to the phenotype, but wouldn’t meet the stringent statistical rigor imposed on single SNPs.
An example in which such a network emerged from straight GWAS analysis can be found in studies of autoimmunity. Meta-analysis of SNPs identified across multiple autoimmune disease studies found that most of the risk genes functioned in control of the immune response (e.g. cytokines, T cell activation, antigen processing; a nice summary (6)) No surprise given our understanding of these conditions, but imagine if little were known about this disorder, biological networks comprising immune cell regulation genes would represent a highly enriched source for identifying relevant SNPs.
As a proof-of-concept, we recently completed an analysis of Parkinson’s Disease, where there are several known susceptibility genes, but much less is known about the biology most of these genes. Using the Molquant tools, we identified networks among the various Parkinson’s associated genes and then further linked these networks to known biology. We look forward to posting this analysis in the near future.
Follow @molquant on Twitter to receive our news and updates.
1 Charney, E., Still Chasing Ghosts: A New Genetic Methodology Will Not Find the “Missing Heritability” www.independentsciencenews.org
2 Yang et al., “Genome partitioning of genetic variation for complex traits using common SNPs” Nature Genetics 2011 Jun; 43(6):519-25
3 Greg Gibson, “Hints of Hidden Heritability in GWAS” Nature Genetics, Volume 42, Number 7. July 2010
4 Greg Gibson, "Rare and common variants: twenty arguments" Nature Reviews, 13, pp 135-145, 2012
5 Bloom et al., “Finding the Sources of Missing Heritability in a Yeast Cross” Nature, 494, 234–237, 14 February 2013 6 Visscher et al., “Five Years of GWAS Discovery” American Journal of Human Genetics, 2012 January 13; 90(1): 7–24
FURTHER READING Luke Jostins “Estimating heritability using twins” Unzipped Genes Blog. Dec 13, 2010 Brendan Maher, “Personal Genomes: The Case of the Missing Heritability” Nature. 456, 18-21 (2008) “What’s Missing in Missing Heritability” A Molecular Matter Blog. Jan 14, 2013