Analysis of Alexander Disease Gene Sets
BACKGROUND
Alexander disease (AxD) is a rare neurodegenerative disease caused by a mutation in the GFAP gene which codes for the glial fibrillary acidic protein (GFAP) [1]. The GFAP protein supports the formation of myelin sheaths in normal physiology, but in Alexander disease, the gain-of-function mutation in the GFAP gene causes the protein product to accumulate. Instead of helping maintain myelin sheaths, the extra GFAP causes damage to the myelin. The overexpression of GFAP in animal models also results in the appearance and accumulation of Rosenthal fibers (RF), protein aggregates in the cytoplasm of astrocytes [2], in subpial and white matter central nervous system areas, which have typically high GFAP expression. Other than RF build-up, astrocytes in AxD also have abnormal cell shape and function. The Gene Expression Omnibus (GEO) is a major open biomedical research repository for transcriptomics and other omics datasets that currently contains millions of gene expression samples from tens of thousands of studies collected by research laboratories from around the world [3]. Here, we use the GeneSetCart pipeline to analyze gene sets created by comparing gene expression samples obtained from GEO of wild type (WT) or controls to AxD samples (Fig. 1).
METHOD
To obtain the AxD disease signatures, we perform differential gene expression analysis on RNA-seq gene expression samples from three GEO studies that compare control or wild type to AxD samples (GSE198817, GSE197044, GSE116327) [4]. GSE198817 contains gene expression samples from the hippocampus and corpus callosum tissue of Gfap+/+, Gfap+/R236H, and mGFAPTg170-2 transgenic mice. The GSE197044 study has RNA-seq profiles from hippocampus and corpus callosum tissue of male Gfap+/R236H and Gfap+/+ mice in FVB/N-Tac at 8 weeks of age; and the GSE116327 study has profiles from healthy controls and AxD patients iPSC-derived astrocytes and post-mortem brain tissues. Differentially expressed genes between healthy controls and disease samples for each study are computed using the limma method [5]. This analysis was performed with the bulk RNA-seq analysis pipeline appyter [6]. The up and down genes were converted into gene sets. These gene sets were uploaded to GeneSetCart for further integrative analysis. Using the GeneSetCart Combine feature, consensus up and down sets were created. Choosing the consensus criteria of 3, the consensus up signature has 65 genes and the consensus down signature has 20 genes. These up and down consensus sets were submitted to SigCom LINCS [7] to identify potential drugs and preclinical small molecules that may reverse the disease gene expression changes in different cell lines. We also perform gene set enrichment analysis on the consensus up and down sets with Enrichr [8] (Fig 2A-C).
RESULT AND DISCUSSION
The consensus upregulated genes are enriched for transcription factors known to regulate immune response and inflammation. The top three transcription factors from the ChEA [9] analysis are RELA, IRF8 and STAT3 (p<0.0001, Fisher’s exact test). Consistent with inflammation and AxD, the top enriched WikiPathways [10] pathway is Spinal Cord Injury WP2432 (p=8.234e-7) with the 6 overlapping genes: CXCL10, CCND1, CCL2, CXCL1, VIM, and GFAP. The most profound results from the enrichment analysis come from the MGI Mouse Phenotypes library with the top 4 most enriched terms: Increased Susceptibility To Induced Morbidity/Mortality MP:0009763 (p=1.576e-8), CNS Inflammation MP:0006082 (p=3.795e-7), Demyelination MP:0000921 (p=0.000004793), and Abnormal Myelination MP:0000920 (p=0.00001292). The knockout mice of the overlapping genes with these terms could serve as AxD disease models due to the shared phenotype. GFAP only overlaps with genes from the Abnormal Myelination MP:0000920 phenotype together with TYROBP, PTPRC, ADGRG6, and TLR2 (Fig. 4B). When querying Rummagene [11] with the consensus up genes, the brain inflammation signature is further confirmed. Several of the top matching sets in Rummagene are from brain inflammation studies with two studies about prion disease [12,13] suggesting potentially similar mechanisms between prion disease and AxD.
The consensus down-regulated genes are enriched for terms related to brain tissues and cell types. Specifically, markers for astrocytes are the top enriched terms from the gene set libraries created from CellMarker [14], Tabula Muris [15], PanglaoDB [16], and Allen Brain Atlas 10x scRNA [17] (Fig. 4C). This observation is also supported by a RummaGEO [18] query that returned matching gene sets from studies titled: RNA-Seq of human astrocytes GSE73721; Regionally specified human pluripotent stem cell-derived astrocytes GSE133489; and CROP-seq of hiPSC-derived astrocytes GSE182307 and GSE182309.
REFERENCES
[1] Kuhn J, Cascella M. Alexander Disease. StatPearls Publishing; 2023.
[2] Messing A, Head MW, Galles K, Galbreath EJ, Goldman JE, Brenner M. Fatal encephalopathy with astrocyte inclusions in GFAP transgenic mice. Am J Pathol. 1998;152: 391–398.
[3] Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, et al. NCBI GEO: archive for functional genomics data sets--update. Nucleic Acids Res. 2013;41: D991–5.
[4] Gammie SC, Messing A, Hill MA, Kelm-Nelson CA, Hagemann TL. Large-scale gene expression changes in APP/PSEN1 and GFAP mutation models exhibit high congruence with Alzheimer’s disease. PLoS One. 2024;19: e0291995.
[5] Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43: e47.
[6] Clarke DJB, Jeon M, Stein DJ, Moiseyev N, Kropiwnicki E, Dai C, et al. Appyters: Turning Jupyter Notebooks into data-driven web apps. Patterns (N Y). 2021;2: 100213.
[7] Evangelista JE, Clarke DJB, Xie Z, Lachmann A, Jeon M, Chen K, et al. SigCom LINCS: data and metadata search engine for a million gene expression signatures. Nucleic Acids Res. 2022;50: W697–W709.
[8] Chen EY, Tan CM, Kou Y, Duan Q, Wang Z, Meirelles GV, et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics. 2013;14: 128.
[9] Keenan AB, Torre D, Lachmann A, Leong AK, Wojciechowicz ML, Utti V, et al. ChEA3: transcription factor enrichment analysis by orthogonal omics integration. Nucleic Acids Res. 2019;47: W212–W224.
[10] Kutmon M, Riutta A, Nunes N, Hanspers K, Willighagen EL, Bohler A, et al. WikiPathways: capturing the full diversity of pathway knowledge. Nucleic Acids Res. 2016;44: D488–94.
[11] Clarke DJB, Marino GB, Deng EZ, Xie Z, Evangelista JE, Ma’ayan A. Rummagene: massive mining of gene sets from supporting materials of biomedical research publications. Commun Biol. 2024;7: 482.
[12] Slota JA, Medina SJ, Frost KL, Booth SA. Neurons and astrocytes elicit brain region specific transcriptional responses to prion disease in the Murine CA1 and thalamus. Front Neurosci. 2022;16: 918811.
[13] Crespo I, Roomp K, Jurkowski W, Kitano H, del Sol A. Gene regulatory network analysis supports inflammation as a key neurodegeneration process in prion disease. BMC Syst Biol. 2012;6: 132.
[14] Zhang X, Lan Y, Xu J, Quan F, Zhao E, Deng C, et al. CellMarker: a manually curated resource of cell markers in human and mouse. Nucleic Acids Res. 2019;47: D721–D728.
[15] Tabula Muris Consortium. A single-cell transcriptomic atlas characterizes ageing tissues in the mouse. Nature. 2020;583: 590–595.
[16] Franzén O, Gan L-M, Björkegren JLM. PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data. Database (Oxford). 2019;2019. doi:10.1093/database/baz046
[17] Shen EH, Overly CC, Jones AR. The Allen Human Brain Atlas: comprehensive gene expression mapping of the human brain. Trends Neurosci. 2012;35: 711–714.
[18] Marino GB, Clarke DJB, Lachmann A, Deng EZ, Ma’ayan A. RummaGEO: Automatic mining of human and mouse gene sets from GEO. Patterns (N Y). 2024;5: 101072.
CFDE GMT Crossing: GTEx Aging Signatures vs MoTrPAC Exercise Gene Sets (Blood)
BACKGROUND
Aging is the most profound risk factor for many chronic and non-infectious diseases such as diabetes [1], cardiovascular disease [2] and neurological diseases such as Alzheimer's [3] and Parkinson’s disease [4]. This risk has been linked to the overlap between the molecular basis of these diseases and that of aging [5], [6]. Moderate exercise (planned and intentional physical activity [7]) has become widely accepted as to promote health and aid the prevention of diseases [8] with exercise being shown to induce organelle, cellular, blood-brain and external barrier protection [36], [37], protein proteostasis [9], DNA repair [10], anti-inflammation (as C-reactive protein concentration is strongly related to physical activity [11]), autophagy regulation [12], and other processes which positively contribute to better health and many of which counter the molecular mechanisms of aging [13]. Aging signatures are changes in gene expression which accompany the biological process of aging. The goal of the Molecular Transducers of Physical Activity Consortium (MoTrPAC) is to assess molecular changes that occur in response to physical activity [14]. To investigate the common biological underpinnings that accompany both aging and exercise and discover genes that are induced and repressed due to exercise and also change in levels due to aging, we crossed the aging signatures created from GTEx [15] with the MoTrPAC rat endurance training [16] gene sets.
METHOD
GTEx data provides 1000 samples of gene expression data across 43 tissues from 175 individuals in order to examine how genetic expression varies among tissues [17]. To create GTEx aging signatures, for each tissue, we performed a differential expression analysis using limma voom that compares the expression levels of the young population (ages 20-29) with sex matched older populations e.g patients aged 20-29 vs 60-69. Each generated gene set in the gene set library consisted of up/down differentially expressed genes for each tissue and younger-older comparison group.
To create the MoTrPAC rat endurance gene sets, we performed a differential expression analysis which compared the gene expression levels of a training group vs sex-matched controls for each tissue at every time point (exercise duration) in the study. This data comes from the RNA-seq data created by the MoTrPAC Data Coordination Center (DCC) that measured the gene expression levels or rats across a range of ages and fitness levels by molecular probing of multiple tissues before and after acute and chronic exercise. The resulting dataframe was filtered for genes that have an adjusted p-value < 0.05 to get the gene expression signatures for specific tissue, sex, and time point group . We obtained the up and down differentially expressed genes for each specific tissue, sex, and time point group with up genes being genes with a log fold change (logFC) > 0 and down genes being genes with a log fold change (logFC) < 0. Each gene set in the generated gene set library consisted of up/down differentially expressed genes for each specific tissue, sex, and time point group.
The CFDE GMT crossing feature allows users to cross the GMT files generated by Common Fund DCCs in order to find gene set pairs from the different GMTs that have a significant overlap. The Fisher's exact test is used to quantify the significance of such overlap. We crossed all pairs of gene sets across all libraries and retained gene sets pairs with a p-value < 0.001. The top 5000 pairs of sets of each crossing are saved and displayed in an interactive table. The overlapping genes between each pair can be sent to Enrichr [18] for enrichment analysis or to a GPT-4 model [19] for hypothesis generation about the potential reason for overlap between the gene sets that make up the crossed gene set pair. To create the GPT-4 generated hypothesis, we give GPT-4 a textual description of each gene set in the crossed pair, the overlapping genes, the enriched terms from Enrichr using the GO Biological Processes [20], WikiPathway 2023 Human [21], MGI Mammalian Phenotype Level 4 [22], and GWAS Catalog 2023 [23] libraries as backgrounds, and then ask it to compose an abstract that describes the potential connection between these gene sets.
RESULT AND DISCUSSION
Crossing the GTEx Aging Signatures and MoTrPAC Rat Endurance Training GMTs yields 346 gene set pairs with significant overlap (p-value < 0.001). The top two crossing gene set pairs (GTEx Blood 20-29 vs 60-69 Up ∩ T30-Blood-Rna Female 2W Down and GTEx Blood 20-29 vs 70-79 Up ∩ T30-Blood-Rna Female 2W Down) have 35 (p-value=6.52e-38) and 26 (p-value=1.05e-24) overlapping genes respectively. Furthermore, by adding these overlapping genes to our cart and using them to start an interactive session, the intersection set operation of the top two pairs shows these sets share 24 genes in common
The GTEx Blood 20-29 vs 60-69 Up gene set contains genes that are upregulated when comparing the blood of subjects aged 60-60 to those aged 20-29. GTEx Blood 20-29 vs 70-79 Up gene set contains genes that are upregulated when comparing the blood of subjects aged 60-60 to those aged 20-29. The T30-Blood-Rna Female 2W Down gene set consists of genes that are downregulated in the blood of rats after four weeks of endurance training. The enrichment analysis found enriched pathways related to immune response, blood coagulation, and lipid metabolism, which are all processes that are well accepted to be affected by aging and physical activity (Fig. 9). Some enriched terms are blood related processes that are particularly known to undergo significant changes with aging and exercise such as blood coagulation and fibrinolysis. It is well known that aging is associated with increased plasma levels of many proteins of blood coagulation [24]. Additionally, short term exercise is also associated with transient increase in blood coagulation, moderate exercise with enhancing blood fibrinolytic activity without activation of blood coagulation mechanisms while heavy exercise induces simultaneous activation of blood fibrinolysis and coagulation [25]. Research also shows that blood lipids are a likely source of human aging biomarkers with blood lipid levels (including total cholesterol, low- and high-density lipoprotein cholesterol, and triglyceride concentrations) changing in specific ways with age [26], [27] while endurance exercise induces fat oxidation [28].
The GPT-4 generated hypothesis further explains the possible ways that these overlapping genes are tied to the physiological changes that accompany both aging and exercise through these enriched pathways. It proposes that the high overlap between the two gene sets could be due to the shared influence of aging and physical activity on these biological pathways and processes in opposing ways. The GPT-4 generated hypothesis posits that the overlap is because “Aging is associated with changes in lipid metabolism and increased risk of cardiovascular diseases, while regular physical activity can improve lipid profile and reduce the risk of cardiovascular diseases. Aging can lead to changes in blood coagulation, and physical activity is known to influence blood viscosity and coagulation.”
This is in line with studies which have shown that exercise can reverse many of the hallmarks of aging through anti-aging mechanisms [31] giving it therapeutic potential for aging related diseases.
A novelty assessment of the top crossing pair sets was done by comparing the number of publications each gene is associated with according to GeneRIF (Fig. 10). We find that 65% of the genes contained in those sets are associated in less than 100 publications in PubMed with two genes (HAO1 and SLC25A47) being associated with less than 10 publications. We also find that of these 37 genes, only 11 are found to be related to aging or exercise (CRP, HNF4A, TTR, HAMP, FGB, RBP4, FGA, APOC1, HABP2, SERPINA4 and AGXT2). These results suggest that many of our observed genes are understudied in the context of both aging and exercise, and the relationship between these two processes that our crossing results have shown to be significantly related. This provides evidence for the need for further exploration of the link between these genes and both aging and exercise.
CONCLUSION
Here, we found some biological pathways and their related genes that are associated with both aging and exercise through crossing aging signatures created from GTEx and exercise related gene sets from MoTrPAC. Many of these genes are mentioned in less than 100 publications which suggests that they might be understudied. This provides evidence for further exploration of the link between these genes and both aging and exercise (Fig.3). This displays the utility of the GeneSetCart application in integrating data sets produced from various Common Fund programs to stimulate scientific discovery.
REFERENCES
[1] H. L. C. Wilkerson, “Problems of an Aging Population,” Am. J. Public Health Nations. Health, vol. 37, no. 2, pp. 177–188, Feb. 1947.
[2] B. J. North and D. A. Sinclair, “The intersection between aging and cardiovascular disease,” Circ. Res., vol. 110, no. 8, pp. 1097–1108, Apr. 2012.
[3] X. Xia, Q. Jiang, J. McDermott, and J.-D. J. Han, “Aging and Alzheimer’s disease: Comparison and associations from molecular to system level,” Aging Cell, vol. 17, no. 5, p. e12802, Oct. 2018.
[4] A. Reeve, E. Simcox, and D. Turnbull, “Ageing and Parkinson’s disease: why is advancing age the biggest risk factor?,” Ageing Res. Rev., vol. 14, no. 100, pp. 19–30, Mar. 2014.
[5] G. Wick, P. Jansen-Dürr, P. Berger, I. Blasko, and B. Grubeck-Loebenstein, “Diseases of aging,” Vaccine, vol. 18, no. 16, pp. 1567–1583, Feb. 2000.
[6] D. Saul and R. L. Kosinsky, “Epigenetics of Aging and Aging-Associated Diseases,” Int. J. Mol. Sci., vol. 22, no. 1, Jan. 2021, doi: 10.3390/ijms22010401.
[7] C. J. Caspersen, K. E. Powell, and G. M. Christenson, “Physical activity, exercise, and physical fitness: definitions and distinctions for health-related research,” Public Health Rep., vol. 100, no. 2, pp. 126–131, Mar-Apr 1985.
[8] C. Fiuza-Luces et al., “Exercise benefits in cardiovascular disease: beyond attenuation of traditional risk factors,” Nat. Rev. Cardiol., vol. 15, no. 12, pp. 731–743, Dec. 2018.
[9] M. A. Małkiewicz, A. Szarmach, A. Sabisz, W. J. Cubała, E. Szurowska, and P. J. Winklewski, “Blood-brain barrier permeability and physical exercise,” J. Neuroinflammation, vol. 16, no. 1, p. 15, Jan. 2019.
[10] P. S. Souza et al., “Physical Exercise Attenuates Experimental Autoimmune Encephalomyelitis by Inhibiting Peripheral Immune Response and Blood-Brain Barrier Disruption,” Mol. Neurobiol., vol. 54, no. 6, pp. 4723–4737, Aug. 2017.
[11] R. V. Musci, K. L. Hamilton, and B. F. Miller, “Targeting mitochondrial function and proteostasis to mitigate dynapenia,” Eur. J. Appl. Physiol., vol. 118, no. 1, pp. 1–9, Jan. 2018.
[12] Z. Radák et al., “Exercise training decreases DNA damage and increases DNA repair and resistance against oxidative stress of proteins in aged rat skeletal muscle,” Pflugers Arch., vol. 445, no. 2, pp. 273–278, Nov. 2002.
[13] E. S. Ford, “Does exercise reduce inflammation? Physical activity and C-reactive protein among U.S. adults,” Epidemiology, vol. 13, no. 5, pp. 561–568, Sep. 2002.
[14] J. A. Sanford et al., “Molecular Transducers of Physical Activity Consortium (MoTrPAC): Mapping the Dynamic Responses to Exercise,” Cell, vol. 181, no. 7, pp. 1464–1474, Jun. 2020.
[15] K. Jia, C. Cui, Y. Gao, Y. Zhou, and Q. Cui, “An analysis of aging-related genes derived from the Genotype-Tissue Expression project (GTEx),” Cell Death Discov, vol. 4, p. 26, Aug. 2018.
[16] S. Schenk et al., “Physiological Adaptations to Progressive Endurance Exercise Training in Adult And Aged Rats: Insights from The Molecular Transducers of Physical Activity Consortium (MoTrPAC),” Function, doi: 10.1093/function/zqae014.
[17] T. G. Consortium et al., “The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans,” Science, vol. 348, no. 6235, pp. 648–660, 2015.
[18] E. Y. Chen et al., “Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool,” BMC Bioinformatics, vol. 14, p. 128, Apr. 2013.
[19] OpenAI et al., “GPT-4 Technical Report,” arXiv [cs.CL], Mar. 15, 2023. [Online]. Available: http://arxiv.org/abs/2303.08774
[20] Gene Ontology Consortium, “Gene Ontology Consortium: going forward,” Nucleic Acids Res., vol. 43, no. Database issue, pp. D1049–56, Jan. 2015.
[21] M. Kutmon et al., “WikiPathways: capturing the full diversity of pathway knowledge,” Nucleic Acids Res., vol. 44, no. D1, pp. D488–94, Jan. 2016.
[22] J. A. Blake, C. J. Bult, J. T. Eppig, J. A. Kadin, J. E. Richardson, and Mouse Genome Database Group, “The Mouse Genome Database genotypes::phenotypes,” Nucleic Acids Res., vol. 37, no. Database issue, pp. D712–9, Jan. 2009.
[23] E. Sollis et al., “The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource,” Nucleic Acids Res., vol. 51, no. D1, pp. D977–D985, Jan. 2023.
[24] J. M. Memme, A. T. Erlich, G. Phukan, and D. A. Hood, “Exercise and mitochondrial health,” J. Physiol., vol. 599, no. 3, pp. 803–817, Feb. 2021.
[25] Y. Qiu et al., “Exercise sustains the hallmarks of health,” J Sport Health Sci, vol. 12, no. 1, pp. 8–35, Jan. 2023.
[26] R. P. Tracy and E. G. Bovill, “Thrombosis and cardiovascular risk in the elderly,” Arch. Pathol. Lab. Med., vol. 116, no. 12, pp. 1307–1312, Dec. 1992.
[27] M. S. El-Sayed, C. Sale, P. G. Jones, and M. Chester, “Blood hemostasis in exercise and training,” Med. Sci. Sports Exerc., vol. 32, no. 5, pp. 918–925, May 2000.
[28] A. A. Johnson and A. Stolzing, “The role of lipid metabolism in aging, lifespan regulation, and age-related disease,” Aging Cell, vol. 18, no. 6, p. e13048, Dec. 2019.
[29] Prospective Studies Collaboration et al., “Blood cholesterol and vascular mortality by age, sex, and blood pressure: a meta-analysis of individual data from 61 prospective studies with 55,000 vascular deaths,” Lancet, vol. 370, no. 9602, pp. 1829–1839, Dec. 2007.
[30] J. F. Horowitz and S. Klein, “Lipid metabolism during endurance exercise,” Am. J. Clin. Nutr., vol. 72, no. 2 Suppl, p. 558S–63S, Aug. 2000.
[31] P. V. Carapeto and C. Aguayo-Mazzucato, “Effects of exercise on cellular and tissue aging,” Aging , vol. 13, no. 10, pp. 14522–14543, May 2021.
CFDE GMT Crossing: GTEx Aging Signatures vs MoTrPAC Exercise Gene Sets (Adrenal and Adipose Tissue)
BACKGROUND
Aging is the most profound risk factor for many chronic and non-infectious diseases such as diabetes [1], cardiovascular disease [2] and neurological diseases such as Alzheimer's [3] and Parkinson’s disease [4]. This risk has been linked to the overlap between the molecular basis of these diseases and that of aging [5], [6]. Moderate exercise (planned and intentional physical activity [7]) has become widely accepted as to promote health and aid the prevention of diseases [8] with exercise being shown to induce organelle, cellular, blood-brain and external barrier protection [36], [37], protein proteostasis [9], DNA repair [10], anti-inflammation (as C-reactive protein concentration is strongly related to physical activity [11]), autophagy regulation [12], and other processes which positively contribute to better health and many of which counter the molecular mechanisms of aging [13]. Aging signatures are changes in gene expression which accompany the biological process of aging. The goal of the Molecular Transducers of Physical Activity Consortium (MoTrPAC) is to assess molecular changes that occur in response to physical activity [14]. To investigate the common biological underpinnings that accompany both aging and exercise and discover genes that are induced and repressed due to exercise and also change in levels due to aging, we crossed the aging signatures created from GTEx [15] with the MoTrPAC rat endurance training [16] gene sets.
METHOD
GTEx data provides 1000 samples of gene expression data across 43 tissues from 175 individuals in order to examine how genetic expression varies among tissues [17]. To create GTEx aging signatures, for each tissue, we performed a differential expression analysis using limma voom that compares the expression levels of the young population (ages 20-29) with sex matched older populations e.g patients aged 20-29 vs 60-69. Each generated gene set in the gene set library consisted of up/down differentially expressed genes for each tissue and younger-older comparison group.
To create the MoTrPAC rat endurance gene sets, we performed a differential expression analysis which compared the gene expression levels of a training group vs sex-matched controls for each tissue at every time point (exercise duration) in the study. This data comes from the RNA-seq data created by the MoTrPAC Data Coordination Center (DCC) that measured the gene expression levels or rats across a range of ages and fitness levels by molecular probing of multiple tissues before and after acute and chronic exercise. The resulting dataframe was filtered for genes that have an adjusted p-value < 0.05 to get the gene expression signatures for specific tissue, sex, and time point group . We obtained the up and down differentially expressed genes for each specific tissue, sex, and time point group with up genes being genes with a log fold change (logFC) > 0 and down genes being genes with a log fold change (logFC) < 0. Each gene set in the generated gene set library consisted of up/down differentially expressed genes for each specific tissue, sex, and time point group.
The CFDE GMT crossing feature allows users to cross the GMT files generated by Common Fund DCCs in order to find gene set pairs from the different GMTs that have a significant overlap. The Fisher's exact test is used to quantify the significance of such overlap. We crossed all pairs of gene sets across all libraries and retained gene sets pairs with a p-value < 0.001. The top 5000 pairs of sets of each crossing are saved and displayed in an interactive table. The overlapping genes between each pair can be sent to Enrichr [18] for enrichment analysis or to a GPT-4 model [19] for hypothesis generation about the potential reason for overlap between the gene sets that make up the crossed gene set pair. To create the GPT-4 generated hypothesis, we give GPT-4 a textual description of each gene set in the crossed pair, the overlapping genes, the enriched terms from Enrichr using the GO Biological Processes [20], WikiPathway 2023 Human [21], MGI Mammalian Phenotype Level 4 [22], and GWAS Catalog 2023 [23] libraries as backgrounds, and then ask it to compose an abstract that describes the potential connection between these gene sets.
RESULT AND DISCUSSION
Examination of the most significantly overlapping terms from the GTEx aging signatures crossed with the MoTrPAC rat endurance training reveals several pairs of gene sets with high overlap from adipose tissue and adrenal gland. Specifically, the gene set pairs “GTEx AdiposeTissue 20-29 vs 50-59 Down ∩ T60-Adrenal Male 1W Up”, “GTEx AdiposeTissue 20-29 vs 60-69 Down ∩ T60-Adrenal Male 1W Up”, and “GTEx AdiposeTissue 20-29 vs 60-69 Down ∩ T60-Adrenal Male 1W Up” are in the top 20 most overlapping gene set pairs. Comparing these sets utilizing the Combine module in GeneSetCart reveals 14 shared genes across the three set pairs with a total of 36 unique genes. The “GTEx AdiposeTissue 20-29 vs 50-59 Down” gene set contains genes downregulated in adipose tissue from samples taken from 50-59-year-olds compared to those from 20-29-year-olds whereas “T60-Adrenal Male 1W Up” contains genes upregulated in male rat adrenal tissue following 1 week of exercise. Creating GPT-4 hypothesis for the crossed sets GTEx AdiposeTissue 20-29 vs 50-59 Down ∩ T60-Adrenal Male 1W Up provides several shared mechanisms:
The high overlap between the two gene sets can be attributed to the shared biological pathways and processes that are influenced by both aging in AdiposeTissue and the molecular changes induced by exercise in the Adrenal in Male subjects. The enrichment analysis reveals that the overlapping genes are significantly involved in pathways such as Glycerolipids And Glycerophospholipids WP4722, Triacylglyceride Synthesis WP325, and Adipogenesis WP236, which are crucial for lipid metabolism and adipocyte function. These pathways are essential for maintaining energy homeostasis and are likely to be affected by both aging and exercise. For instance, genes like PFKFB3 and PSAT1 are involved in the Triglyceride Biosynthetic Process (GO:0019432) and Fatty Acid Metabolic Process (GO:0006631), which are critical for energy storage and utilization. Additionally, the regulation of cold-induced thermogenesis (GO:0120161) and positive regulation of cold-induced thermogenesis (GO:0120162) are processes that are vital for maintaining body temperature and metabolic rate, both of which can be influenced by aging and physical activity. The presence of genes such as ADRB3 and G0S2 in these pathways suggests a role in the differentiation of white and brown adipocytes (WP2895) and enhanced lipolysis (MP:0008034), which are important for adapting to metabolic changes. Furthermore, the involvement of genes like FASN and DGAT2 in the Percent Liver Fat and decreased liver triglyceride level (MP:0009356) pathways indicates a connection to liver health and lipid metabolism, which are affected by both aging and exercise. The overlap in genes related to the Positive Regulation Of Metabolic Process (GO:0009893) and abnormal white adipose tissue morphology (MP:0002970) further supports the idea that these gene sets are linked through their roles in metabolic regulation and adipose tissue function. Overall, the enrichment analysis highlights the interconnected nature of these biological processes, explaining the significant overlap between the gene sets.” (Fig. 1A-C)
In an analysis of transcription factors most likely to regulate the overlapping genes between the three sets using ChEA3 [1], PPARG is the top ranked with 10 out of 14 of the genes being known targets of PPARG (Fig. 1D). PPARG is a key regulator of lipid metabolism, and adipogenesis, and a known therapeutic target for type 2 diabetes [2]. A related TF in the same family, PPARD, which is not included in ChEA, promotes cardiovascular endurance through the conservation of glucose [3]. Agonists of PPARD, such as GW501516 also known as Endurobol, have been used by elite athletes to improve their performance and are banned by the World Anti-Doping agency. This crossing and the identified shared genes may be key understudied genes in the modulation of response to exercise as it relates to aging.
REFERENCES
[1] H. L. C. Wilkerson, “Problems of an Aging Population,” Am. J. Public Health Nations. Health, vol. 37, no. 2, pp. 177–188, Feb. 1947.
[2] B. J. North and D. A. Sinclair, “The intersection between aging and cardiovascular disease,” Circ. Res., vol. 110, no. 8, pp. 1097–1108, Apr. 2012.
[3] X. Xia, Q. Jiang, J. McDermott, and J.-D. J. Han, “Aging and Alzheimer’s disease: Comparison and associations from molecular to system level,” Aging Cell, vol. 17, no. 5, p. e12802, Oct. 2018.
[4] A. Reeve, E. Simcox, and D. Turnbull, “Ageing and Parkinson’s disease: why is advancing age the biggest risk factor?,” Ageing Res. Rev., vol. 14, no. 100, pp. 19–30, Mar. 2014.
[5] G. Wick, P. Jansen-Dürr, P. Berger, I. Blasko, and B. Grubeck-Loebenstein, “Diseases of aging,” Vaccine, vol. 18, no. 16, pp. 1567–1583, Feb. 2000.
[6] D. Saul and R. L. Kosinsky, “Epigenetics of Aging and Aging-Associated Diseases,” Int. J. Mol. Sci., vol. 22, no. 1, Jan. 2021, doi: 10.3390/ijms22010401.
[7] C. J. Caspersen, K. E. Powell, and G. M. Christenson, “Physical activity, exercise, and physical fitness: definitions and distinctions for health-related research,” Public Health Rep., vol. 100, no. 2, pp. 126–131, Mar-Apr 1985.
[8] C. Fiuza-Luces et al., “Exercise benefits in cardiovascular disease: beyond attenuation of traditional risk factors,” Nat. Rev. Cardiol., vol. 15, no. 12, pp. 731–743, Dec. 2018.
[9] M. A. Małkiewicz, A. Szarmach, A. Sabisz, W. J. Cubała, E. Szurowska, and P. J. Winklewski, “Blood-brain barrier permeability and physical exercise,” J. Neuroinflammation, vol. 16, no. 1, p. 15, Jan. 2019.
[10] P. S. Souza et al., “Physical Exercise Attenuates Experimental Autoimmune Encephalomyelitis by Inhibiting Peripheral Immune Response and Blood-Brain Barrier Disruption,” Mol. Neurobiol., vol. 54, no. 6, pp. 4723–4737, Aug. 2017.
[11] R. V. Musci, K. L. Hamilton, and B. F. Miller, “Targeting mitochondrial function and proteostasis to mitigate dynapenia,” Eur. J. Appl. Physiol., vol. 118, no. 1, pp. 1–9, Jan. 2018.
[12] Z. Radák et al., “Exercise training decreases DNA damage and increases DNA repair and resistance against oxidative stress of proteins in aged rat skeletal muscle,” Pflugers Arch., vol. 445, no. 2, pp. 273–278, Nov. 2002.
[13] E. S. Ford, “Does exercise reduce inflammation? Physical activity and C-reactive protein among U.S. adults,” Epidemiology, vol. 13, no. 5, pp. 561–568, Sep. 2002.
[14] J. A. Sanford et al., “Molecular Transducers of Physical Activity Consortium (MoTrPAC): Mapping the Dynamic Responses to Exercise,” Cell, vol. 181, no. 7, pp. 1464–1474, Jun. 2020.
[15] K. Jia, C. Cui, Y. Gao, Y. Zhou, and Q. Cui, “An analysis of aging-related genes derived from the Genotype-Tissue Expression project (GTEx),” Cell Death Discov, vol. 4, p. 26, Aug. 2018.
[16] S. Schenk et al., “Physiological Adaptations to Progressive Endurance Exercise Training in Adult And Aged Rats: Insights from The Molecular Transducers of Physical Activity Consortium (MoTrPAC),” Function, doi: 10.1093/function/zqae014.
[17] T. G. Consortium et al., “The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans,” Science, vol. 348, no. 6235, pp. 648–660, 2015.
[18] E. Y. Chen et al., “Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool,” BMC Bioinformatics, vol. 14, p. 128, Apr. 2013.
[19] OpenAI et al., “GPT-4 Technical Report,” arXiv [cs.CL], Mar. 15, 2023. [Online]. Available: http://arxiv.org/abs/2303.08774
[20] Gene Ontology Consortium, “Gene Ontology Consortium: going forward,” Nucleic Acids Res., vol. 43, no. Database issue, pp. D1049–56, Jan. 2015.
[21] M. Kutmon et al., “WikiPathways: capturing the full diversity of pathway knowledge,” Nucleic Acids Res., vol. 44, no. D1, pp. D488–94, Jan. 2016.
[22] J. A. Blake, C. J. Bult, J. T. Eppig, J. A. Kadin, J. E. Richardson, and Mouse Genome Database Group, “The Mouse Genome Database genotypes::phenotypes,” Nucleic Acids Res., vol. 37, no. Database issue, pp. D712–9, Jan. 2009.
[23] E. Sollis et al., “The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource,” Nucleic Acids Res., vol. 51, no. D1, pp. D977–D985, Jan. 2023.
[24] A. B. Keenan et al., “ChEA3: transcription factor enrichment analysis by orthogonal omics integration,” Nucleic Acids Res., vol. 47, no. W1, pp. W212–W224, Jul. 2019.
[25] M. Ahmadian et al., “PPARγ signaling and metabolism: the good, the bad and the future,” Nat. Med., vol. 19, no. 5, pp. 557–566, May 2013.
[26] W. Fan et al., “PPARδ promotes running endurance by preserving glucose,” Cell Metab., vol. 25, no. 5, pp. 1186–1193.e4, May 2017.