April 10, 2014. To demonstrate the power of the tools we’re building at Molquant, we recently took a look at Parkinson’s disease, a challenging and complex neurodegenerative disorder affecting nearly 10 million people worldwide.
Applying our correlation tools, we discovered previously unreported relationships among 16 known Parkinson’s disease genes. Two major subgroups were found, which we could link to specific biological networks that we generated. A picture emerged suggesting that the normal biology of these gene subgroups functioned to integrate nutrient and oxygen conditions with cellular processes that tend mitochondria and maintain appropriate levels of mitochondrial energy production in neurons.
This mathematically derived picture is consistent with the emerging view in the field, but further suggests that different Parkinson’s associated variant genes will impact different nodes of this biology. Our analysis supports the recent hypothesis in the field that some diabetes drugs (thiazolidonediones) may be attractive therapeutic options for Parkinson’s disease, but also suggests that these agents may elicit different activity depending on a patient’s particular Parkinson’s associated gene variant. In addition, the larger set of genes identified in this analysis represents a promising set of candidate genes to prioritize or expand the current Parkinson’s disease associated gene sets.
The detailed manuscript can be downloaded PD Gene Networks Molquant.pdf with more figures, discussion, references for further reading. An excerpt of the PDF follows:
RESULTS AND DISCUSSION
Using correlation algorithms applied to a large set of gene expression data, we developed a set of networks linked to biological processes.
Figure 1 shows network and matrix “bathymetry” plots for 19 biologically defined networks. In the network plot (1A), 10 genes for each network are plotted as nodes, correlation represent the length of the edges (edges not rendered here). The bathymetry plot (1B) (a 2D matrix, so data are duplicated below the diagonal) shows correlation relationships among the 190 genes rendered as color and “height”.
Networks include: vesicle transport, autophagy, axon motors, TFEB anchored lysosomal biogenesis, skeletal muscle, adipose tissue, ribosomes, PGC1a anchored mitochondrial biogenesis, mitochondria, glycolysis, mitosis, bone/cartilage, hypoxia/blood vessels, T cells, NK cells, CD68 macrophages, CD14 macrophages, neutrophils.
Parkinson’s Disease Gene Networks
Sixteen genes in which genetic alterations were associated with risk of PD: ATP13A2, GBA, FBXO7, LRRK2, MAPT, PARK2, PARK7, PINK1, SMPD1, SNCA, SYNJ1, UCHL1, and from an unpublished report: NOVA2, OR56B4, PABPC1L, RPE65 (1).
Each of the 16 genes served as the “seed” gene for network formation and the resulting network and bathymetry plots are shown in figure 2. Two distinct network clusters emerged from the analysis: The first, referred to as the PARK2 meta-network includes PARK2, LRRK2, PINK1 and the newly reported gene NOVA2. The second, the SNCA meta-network includes SNCA, RPE65, SYNJ1 and MAPT. Three other genes also exhibited correlation to these two clusters (FBXO7, SMPD1, and UCHL1), whereas PARK7, PABPC1L, OR56B4, GBA and ATP13A2 networks exhibited limited correlation linkage. Also, note that the two meta-networks exhibited correlation linkage to one another (figure 2B). The data used to generate these networks are drawn from many sources; we interpret these networks to represent the normal biology of these genes, not the pathologic state associated with PD.
Links between Parkinson’s Gene Networks and Biological Networks
To provide biological annotation of the Parkinson’s gene networks, biological networks and Parkinson’s gene networks were plotted together (Figure 3).
The PARK2 meta-network exhibited correlation to four of the queried biological networks, a vesicle network seeded by VAMP2, a PGC1a (PPARGC1A) seeded network, a lysosomal biogenesis network seeded by TFEB, and a hypoxia network seeded by VEGFA.
These correlations enable a hypothesis that the non-pathogenic versions of LRRK2, PARK2, PINK1 and NOVA2 normally function in intracellular processes that control the integration of nutrient and oxygen sensing with the cellular machinery that controls mitochondrial homeostasis. The fact that the pathogenic forms of these genes are directly linked to disease suggests that the linked biological pathways are altered in PD.
The observations and hypothesis are consistent with several previous findings in the field. PARK2 and PINK1 have been shown previously to function in the control of mitochondrial homeostasis, including motility (2, 3), as has FBXO7 (4), which here exhibits modest correlation linkage to the PARK2 meta-network. The transcriptional co-activator PGC1a/PPARGC1A is thought to be a central controller of this process (5, 6). In addition, meta-analysis of expression data from Parkinson’s affected tissues identified a PGC1a signature as one of the top altered pathways (7).
Note that these biological processes are not directly related to the mitochondrion per se (figure 3 shows limited correlation linkage to integral mitochondrial genes), rather to components and regulators of intracellular organelle trafficking. LRRK2 in particular, exhibits tight correlations to several cytoskeletal, vesicle/intracellular transport genes (COL4A3, COL4A4, SYNE1, ARHGAP24,31, GPRASP1, LMBRD1, SLC6A13, STX12, TRAK2/Milton), suggesting a role in modulation of intracellular transport. Also consistent with the hypothesis, the Parkinson’s associated lysosomal gene SMPD1 also exhibits linkage to the PARK2 meta-networks.
The SNCA meta-network also exhibited correlation to PPARGC1A (PGC1a) network, but the strongest linkages are to cytoskeletal motor protein networks: either the kinesin motor KIF1A, or the myosin motor gene myosin 5A (MYO5A). The RPE65 network exhibited tightest linkage to the MYO5A motor network, whereas SYNJ1, a known synaptic vesicle protein exhibited tightest linkage to the KIF1A network; SNCA shared linkages with both motor networks. Together, these observations further support a hypothesis that these Parkinson’s genes normally function as components of and regulators of axonal transport, which is intimately linked to mitochondrial homeostasis (8).
The other queried Parkinson’s genes exhibit distinct linkage correlations. PABPC1L, an RNA binding protein of unknown function, and to a lesser extent, ATP13A2 exhibit linkage to an autophagy network seeded by ATG4B. PARK7 exhibits tight linkage to the integral mitochondrial protein network anchored by COX6A1 (as does the PD gene candidate HTRA2 (data not shown)). Olfactory Receptor OR56B4 doesn’t exhibit significant linkage to any networks analyzed. Although not shown here, PD associated gene VPS35 exhibits weak linkage to SYNJ1, and PD gene PLA2G6 doesn’t exhibit significant linkage to any networks analyzed (data not shown). Further analysis exploring a broader set of biological networks may reveal additional linkages.
PGC1a, a potential protective network in some genetically defined forms of PD.
Consistent with the previously reported observation that a PGC1a associated signatures are altered in PD (7), this analysis hypothesizes a role for PGC1a in biological processes associated with some of the identified PD genes. The PGC1a (PPARGC1A) network exhibits linkage to the both the PARK2 and SNCA meta-networks (figure 3). Although several lines of evidence indicate that PGC1a is associated with mitochondrial biogenesis, networks here clearly distinguish between “intrinsic” mitochondrial networks (those comprising mitochondrial genes) and PGC1a associated networks. The network analysis shown here postulates that PCG1a’s impact on mitochondrial biogenesis occurs through its modulation of these associated transport and sensor pathways.
Based on previous links between the thiozolidinedione drugs, PGC1a and PD, a phase 2 trial was initiated in 2011 (9). Given the association of PGC1a pathways with specific subgroups of PD genes, assessment of the genotype of patients in such a study may provide additional information regarding genotype/efficacy relationships.
Several lines of evidence have linked PGC1a with AMP kinase, a rheostat of nutrient availability that responds to AMP/ATP ratios in a cell. The recently reported drug candidate R118, a potent AMPK activator (paper on related molecule 10) just entered the clinic, intended for treatment of peripheral artery disease. If this molecule achieves adequate exposure and tolerability in human studies, it may represent an intriguing candidate for PD.
Parkinson’s network gene lists
In addition to providing tools for biological interpretation, the Parkinson’s gene networks identified here comprise candidate PD gene lists suitable for exploration of current and future GWAS analysis, or for targeted genotyping studies.
Here, only examining genes of the two meta networks, figure 4 shows the overlap of the top 1000 network genes for each PD gene from either the PARK2 (figure 4A) or SNCA (figure 4B) meta-network. The identified genes are those that correlate with at least three of the four PD genes in each network, and therefore hypothesized to participate in the same biological processes. Here is a downloadable file of the gene set.
Intersection of these core network genes with the online PD Gene database resulted in 35 genes that are present both in the PARK2 and/or SNCA meta-networks and the PDgene database (figure 5).
2014 chromosome 1 PD locus gene S1PR1 present in PARK2 meta-network
A stratified analysis of PD GWAS data identified a previously unrecognized PD locus on Chr 1 that included candidate genes DPH5, OLFM3, S1PR1, SLC30A7, VCAM (11). We note that the sphingosine phosphate receptor S1PR1 is included in the PARK2 meta-network. S1PR1 expression also exhibits correlation to MAPT and SYNJ1 networks as well, providing a strong network support to hypothesize that S1PR1 may represent the relevant affected gene for the locus. S1PR1 is widely expressed, exhibiting multifunctional roles including lymphocyte trafficking (the target of the anti-MS drug fingolimod) and angiogenesis. The network linkages observed here support a role for S1PR1 in PD gene associated function, including the previously mentioned hypoxia network.
Only transcript level regulation is captured
Although transcript level regulation is widespread, it represents only one of many ways to regulate biological processes. Genes that are widely expressed and exhibit a high degree of post-translational regulation may not be captured in these networks. Nevertheless, our observations have been that, given a large enough sample size with sufficient biological complexity, i.e. many diverse samples, we can obtain networks exhibiting tight correlation of genes comprising well known biological processes (glycolysis, mitochondria, ribosomes, skeletal muscle, adipose tissue, taste receptor cells and others). However, several other attempts to generate correlation based biological networks were unsuccessful (circadian rhythm genes, Notch pathway genes, retina specific genes, nonsense mediated RNA decay and others), highlighting the challenges inherent in expression based networks. Certainly, the identified networks will not comprehensively capture all of the component genes of a particular biology, but the observations presented here argue that this approach represents a useful tool to analyze and interpret complex biology.
Untangling co-occurring biological processes
Many biological processes, although distinct, occur in the same tissues or conditions. If an uncharacterized gene correlates with two (or more) biological networks, it is difficult to assign it to a specific process. For example, although we recognize proteasomal degradation, splicing, and DNA replication to be distinct biological processes, networks comprising core genes from each of these functions exhibit a high degree of cross correlation in our datasets (data not shown). Uncharacterized gene C20ORF24 correlates highly with networks seeded by proteasomal gene PSMA5, pre-replication complex gene MCM10, and splicing factor SNRPB, precluding assignment to any of those processes. The co-regulation of these three networks probably tells an interesting story itself, but this analysis merely identifies the phenomenon, not its resolution.
Although more than 7000 individual profiles were explored to derive the correlations shown here, the networks identified are subject to the composition of the samples included. Many genes are known to exhibit distinct networks in different tissues, so the correlations observed here likely reflect what is occurring in the dominantly represented tissue(s) or cells. For example, PGC1a may exhibit different correlations in brown adipose tissue or cardiac cells or neurons. Furthermore, our examination of network correlations in large subsets of the overall dataset (e.g. 1600 GTEx RNAseq of human tissues ) demonstrates that while many networks are well replicated in the subsets, other networks are not well recapitulated. We look forward to the rapidly increasing number of profiled samples to improve the networks.