Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
gene_networks_inference [2018/12/14 15:32] – [Papers] admingene_networks_inference [2020/05/27 03:00] (current) – [iNETgrate] admin
Line 1: Line 1:
 ====== Gene networks inference ====== ====== Gene networks inference ======
 +
 +
 ===== Objectives ===== ===== Objectives =====
 Biological processes in a cell often require intricate coordination between //multiple// genes and proteins. The goal of this project is to infer useful biological and clinical information from large networks of thousands of genes. We develop an integrative approach to analyze co-expression and DNA methylation patterns in a single model. The results will be useful in pinpointing the cause and mechanism of complex diseases such as cancer. Our findings can potentially open doors to new targets for novel treatment plans. \\ Biological processes in a cell often require intricate coordination between //multiple// genes and proteins. The goal of this project is to infer useful biological and clinical information from large networks of thousands of genes. We develop an integrative approach to analyze co-expression and DNA methylation patterns in a single model. The results will be useful in pinpointing the cause and mechanism of complex diseases such as cancer. Our findings can potentially open doors to new targets for novel treatment plans. \\
Line 18: Line 20:
 ===== Software ===== ===== Software =====
  
-**Pigengene:** This R package provides an efficient way to perform network analysis and to infer biological signatures from gene expression profiles. The signatures are independent from the underlying platform, e.g., it can infer the signatures using data from microarray and evaluate them in an independent RNA Seq dataset. It is approved by, and publicly available from, [[https://bioconductor.org/packages/Pigengene|Bioconductor]].+==== Pigengene ====
  
-**iNETgrate:** [[https://bitbucket.org/habilzare/genetwork/src/master/code/iNETgrate/|This]] R package is useful to integrate DNA methylation and gene expression data into //a single //network. This approach leads to identification of more robust gene modules compared to conventional coexpression networks. The package will be publicly available after review and approval by Bioconductor.+This R package provides an efficient way to perform network analysis and to infer biological signatures from gene expression profiles. The signatures are independent from the underlying platform, e.g., it can infer the signatures using data from microarray and evaluate them in an independent RNA Seq dataset. It is approved by, and publicly available from, [[https://bioconductor.org/packages/Pigengene|Bioconductor]]
 + 
 +==== iNETgrate ==== 
 + 
 +This R package is useful to integrate DNA methylation and gene expression data into //a single //network ([[https://bitbucket.org/habilzare/genetwork/src/master/code/Ghazal/iNETgrate/|code]]). This approach leads to identification of more robust gene modules compared to conventional coexpression networks. The package will be publicly available after review and approval by Bioconductor. A [[https://docs.google.com/document/d/17ZlnpNFm2QD58j4AI-GbyW_9TPXT_e5dsGYm_DPV2Fs/edit|checklist]] for package completion tasks.
  
  
Line 43: Line 49:
   - 200 AML cases from TCGA (LAML dataset). Available data types include gene expression , DNA-methylation, CNV, mutation, etc. TCGA data moved to [[https://gdc-portal.nci.nih.gov/|GDC]] but DNA-methylation is not there. Instead, it can be retrieved from GDC Legacy Archive or the original [[https://tcga-data.nci.nih.gov/docs/publications/laml_2012/|paper]].   - 200 AML cases from TCGA (LAML dataset). Available data types include gene expression , DNA-methylation, CNV, mutation, etc. TCGA data moved to [[https://gdc-portal.nci.nih.gov/|GDC]] but DNA-methylation is not there. Instead, it can be retrieved from GDC Legacy Archive or the original [[https://tcga-data.nci.nih.gov/docs/publications/laml_2012/|paper]].
   - German [[https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE37642|AMLCG]] 1999 provides microarray data of 562 AML samples.   - German [[https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE37642|AMLCG]] 1999 provides microarray data of 562 AML samples.
-  - Papaemmanuil, Elli, et al. "Genomic classification and prognosis in acute myeloid leukemia." [[http://www.nejm.org/doi/full/10.1056/NEJMoa1516192#t=article|NEJM]] 374.23 (2016): 2209-2221. \\  The mutations of 111 genes in over **1,500 AML**  cases are reported. The authors used this information to classify cases into groups and showed these groups have different prognosis. I.,e., [[https://www.mskcc.org/sites/default/files/node/2246/documents/discrete-cpe.pdf|concordance]] (probability estimates) improves from 64% using only the European LeukemiaNet criteria to 71%. Using the alternative allele frequency, they estimated the time of occurrence for the driver mutations. The data are available through the links in the corresponding [[http://www.nature.com/ng/journal/v49/n3/full/ng.3756.html|Nature]] paper [[[:ng.3756.pdf?media=ng.3756.pdf|pdf]]]. Information on downloading these data is contained in the readme file found in genetwork:~/proj/genetwork/data/AML/gerstung/readme.txt. In particular, we have access to [[https://www.ebi.ac.uk/ega/studies/EGAS00001000275|EGAS00001000275]] through [[https://ega-archive.org/|EGA]] Archives. See [[:habils_lab_notebook|Habil's]] note on 2017/09/05 for more detail. Any member of Oncinfo Lab who touches (analyzes or views) these data from Sanger Institute must read and abide to the [[:sanger_data_agreement_2017-08-09.pdf?media=sanger_data_agreement_2017-08-09.pdf|agreement]].+  - Papaemmanuil, Elli, et al. "Genomic classification and prognosis in acute myeloid leukemia." [[http://www.nejm.org/doi/full/10.1056/NEJMoa1516192#t=article|NEJM]] 374.23 (2016): 2209-2221. \\  The mutations of 111 genes in over **1,500 AML**  cases are reported. The authors used this information to classify cases into groups and showed these groups have different prognosis. I.,e., [[https://www.mskcc.org/sites/default/files/node/2246/documents/discrete-cpe.pdf|concordance]] (probability estimates) improves from 64% using only the European LeukemiaNet criteria to 71%. Using the alternative allele frequency, they estimated the time of occurrence for the driver mutations. The data are available through the links in the corresponding [[http://www.nature.com/ng/journal/v49/n3/full/ng.3756.html|Nature]] paper [[:ng.3756.pdf?media=ng.3756.pdf|pdf]]]. Information on downloading these data is contained in the readme file found in genetwork:~/proj/genetwork/data/AML/gerstung/readme.txt. In particular, we have access to [[https://www.ebi.ac.uk/ega/studies/EGAS00001000275|EGAS00001000275]] through [[https://ega-archive.org/|EGA]] Archives. See [[:habils_lab_notebook|Habil's]] note on 2017/09/05 for more detail. Any member of Oncinfo Lab who touches (analyzes or views) these data from Sanger Institute must read and abide to the [[:sanger_data_agreement_2017-08-09.pdf?media=sanger_data_agreement_2017-08-09.pdf|agreement]].
   - RNA, DNA methylation, whole genome, etc. data of 960 (pediatric?) AML cases are available from [[https://ocg.cancer.gov/programs/target/acute-myeloid-leukemia|TARGET]] AML study.   - RNA, DNA methylation, whole genome, etc. data of 960 (pediatric?) AML cases are available from [[https://ocg.cancer.gov/programs/target/acute-myeloid-leukemia|TARGET]] AML study.
   - AML-NK gene expression data (RNA-Seq) from three datasets (TCGA, Leucegene, and PMP/BCCA). [[https://docs.google.com/a/princeton.edu/document/d/1tB75BDAoG6-ggkoKzxF_f8anTnaP0lOAZ4MG-wEWCyk/edit?usp=sharing|Full description]].   - AML-NK gene expression data (RNA-Seq) from three datasets (TCGA, Leucegene, and PMP/BCCA). [[https://docs.google.com/a/princeton.edu/document/d/1tB75BDAoG6-ggkoKzxF_f8anTnaP0lOAZ4MG-wEWCyk/edit?usp=sharing|Full description]].
Line 49: Line 55:
   - Genomic Data Commons ([[https://portal.gdc.cancer.gov/repository|GDC]]), which contains TCGA data and more.   - Genomic Data Commons ([[https://portal.gdc.cancer.gov/repository|GDC]]), which contains TCGA data and more.
   - [[https://amp.pharm.mssm.edu/archs4/|ARCHS4]], which was developed at the Icahn School of Medicine at Mount Sinai, and provides tools to download and analyze RNA-Seq data including single-cell gene expression.   - [[https://amp.pharm.mssm.edu/archs4/|ARCHS4]], which was developed at the Icahn School of Medicine at Mount Sinai, and provides tools to download and analyze RNA-Seq data including single-cell gene expression.
 +  - The [[https://www.nature.com/articles/s41586-018-0623-z#Sec38|BEAT]] ALM dataset of ~300 cases including gene expression, survival, ELN17, etc.
 +  - [[https://www.leukemiaatlas.org/adultaml|Leukemia Protein Atlas]]: Expression of hundreds of proteins were measured in bone marrow and PB samples of ~200 AML cases. A good publicly available resource to validate findings based on gene expression assays.
 +  - ~40K [[https://www.cell.com/cell/fulltext/S0092-8674(19)30094-7?_returnURL=https://linkinghub.elsevier.com/retrieve/pii/S0092867419300947?showall=true|single cell]] RNA-Seq data from 40 bone marrow aspirates, including 16 AML patients and \\  5 healthy donors.
  
-=====   ===== 
  
 ===== Related work ===== ===== Related work =====
  
-  - Zhang, B. //et al.//  Integrated systems approach identifies genetic nodes and networks in late-onset Alzheimer's disease. //[[http://www.cell.com/abstract/S0092-8674(13)00387-5|Cell]]//**153**, 707–720 (2013). [{{:zhang2013.pdf|pdf}} ] \\  Methodology is based on [2] plus they train **Bayesian networks**  to infer causal structure based on RNA-seq. MCMC was used to optimize BIC score, and \\ 1/3 of 1000 network models were averaged.[[http://www.alzforum.org/webinars/can-network-analysis-identify-pathological-pathways-alzheimers|This]] webinar is a high level description of the methodology. +  - Zhang, B. //et al.//  Integrated systems approach identifies genetic nodes and networks in late-onset Alzheimer's disease. //[[http://www.cell.com/abstract/S0092-8674(13)00387-5|Cell]]//**153**, 707–720 (2013). [{{:zhang2013.pdf|pdf}}  ] \\  Methodology is based on [2] plus they train **Bayesian networks**  to infer causal structure based on RNA-seq. MCMC was used to optimize BIC score, and \\ 1/3 of 1000 network models were averaged.[[http://www.alzforum.org/webinars/can-network-analysis-identify-pathological-pathways-alzheimers|This]] webinar is a high level description of the methodology. 
-  - Emilsson, V. //et al.//  Genetics of gene expression and its effect on disease.//[[http://www.nature.com/nature/journal/v452/n7186/full/nature06758.html|Nature]]//**452**, 423–428 (2008). [[:http:file_view_emilsson2008.pdf_521380262_emilsson2008.pdf|pdf]], {{:emilsson2008-supp.pdf|Supp}} ] (based on their references [29-31], uses [[http://www.sciencemag.org/content/297/5586/1551.full.pdf|this]] topological overlap measure) +  - Emilsson, V. //et al.//  Genetics of gene expression and its effect on disease.//[[http://www.nature.com/nature/journal/v452/n7186/full/nature06758.html|Nature]]//**452**, 423–428 (2008). [[:http:file_view_emilsson2008.pdf_521380262_emilsson2008.pdf|pdf]], {{:emilsson2008-supp.pdf|Supp}}  ] (based on their references [29-31], uses [[http://www.sciencemag.org/content/297/5586/1551.full.pdf|this]] topological overlap measure) 
-  - Identifying Gene Regulatory Networks from Gene Expression Data, [[http://web.cs.ucdavis.edu/~filkov/papers/chapter.pdf|Handbook]] of Computational Molecular Biology (2005) [{{:filkov_chapter27.pdf|pdf}} ]. \\  A good but old book chapter.+  - Identifying Gene Regulatory Networks from Gene Expression Data, [[http://web.cs.ucdavis.edu/~filkov/papers/chapter.pdf|Handbook]] of Computational Molecular Biology (2005) [{{:filkov_chapter27.pdf|pdf}}  ]. \\  A good but old book chapter.
   - [[http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.0020130|Integrating Genetic and Network Analysis to Characterize Genes Related to Mouse Weight]] (Identified modules in networks, "A pair of genes is said to have high topological overlap if they are both strongly connected to the same group of genes.", [[http://labs.genetics.ucla.edu/horvath/htdocs/CoexpressionNetwork/MouseWeight/|Software]])   - [[http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.0020130|Integrating Genetic and Network Analysis to Characterize Genes Related to Mouse Weight]] (Identified modules in networks, "A pair of genes is said to have high topological overlap if they are both strongly connected to the same group of genes.", [[http://labs.genetics.ucla.edu/horvath/htdocs/CoexpressionNetwork/MouseWeight/|Software]])
   - Gene correlation network analysis [Wikipedia [[http://en.wikipedia.org/wiki/Weighted_correlation_network_analysis|page]]]   - Gene correlation network analysis [Wikipedia [[http://en.wikipedia.org/wiki/Weighted_correlation_network_analysis|page]]]
-  - Friedman, Nir, et al. "Using Bayesian networks to analyze expression data." Journal of computational biology 7.3-4 (2000): 601-620.[{{:friedman2000.pdf|pdf}} ] \\  A relatively old but highly cited, (~2700) original paper.+  - Friedman, Nir, et al. "Using Bayesian networks to analyze expression data." Journal of computational biology 7.3-4 (2000): 601-620.[{{:friedman2000.pdf|pdf}}  ] \\  A relatively old but highly cited, (~2700) original paper.
   - Ruan, Jianhua, Angela K. Dean, and Weixiong Zhang. "A general co-expression network-based approach to gene expression analysis: comparison and applications." //[[http://www.biomedcentral.com/1752-0509/4/8|BMC]] systems biology//4.1 (2010): 8. (Dr. Jianhua Ruan from San Antonino)   - Ruan, Jianhua, Angela K. Dean, and Weixiong Zhang. "A general co-expression network-based approach to gene expression analysis: comparison and applications." //[[http://www.biomedcentral.com/1752-0509/4/8|BMC]] systems biology//4.1 (2010): 8. (Dr. Jianhua Ruan from San Antonino)
   - Nagrecha, Saurabh, Pawan J. Lingras, and Nitesh V. Chawla. "Comparison of gene co-expression networks and bayesian networks." //Intelligent Information and Database [[https://www3.nd.edu/~dial/papers/ACIIDS2013.pdf|Systems]]// . Springer Berlin Heidelberg, 2013. 507-516. \\ Simple description and some relatively old literature review. \\ "Bayesian networks emerge as a more informative tool to determine the causal structure."   - Nagrecha, Saurabh, Pawan J. Lingras, and Nitesh V. Chawla. "Comparison of gene co-expression networks and bayesian networks." //Intelligent Information and Database [[https://www3.nd.edu/~dial/papers/ACIIDS2013.pdf|Systems]]// . Springer Berlin Heidelberg, 2013. 507-516. \\ Simple description and some relatively old literature review. \\ "Bayesian networks emerge as a more informative tool to determine the causal structure."
-  - Systems Biology: The inference of networks from high dimensional genomics data, lecture by Yeung 2011 [{{:yeun2011-systemsbiology.ppt|ppt}} ] (a good introduction to application of Bayesian networks in co-expression network)+  - Systems Biology: The inference of networks from high dimensional genomics data, lecture by Yeung 2011 [{{:yeun2011-systemsbiology.ppt|ppt}}  ] (a good introduction to application of Bayesian networks in co-expression network)
   - Hong, Shengjun, et al. "Canonical correlation analysis for RNA-seq co-expression networks." //Nucleic acids [[http://www.ncbi.nlm.nih.gov/pubmed/23460206|research]]//41.8 (2013): e95-e95. (improved co-expression analysis for RNA-seq)   - Hong, Shengjun, et al. "Canonical correlation analysis for RNA-seq co-expression networks." //Nucleic acids [[http://www.ncbi.nlm.nih.gov/pubmed/23460206|research]]//41.8 (2013): e95-e95. (improved co-expression analysis for RNA-seq)
   - Li, Bingshan, et al. "[[http://www.nature.com/jid/journal/v134/n7/full/jid201428a.html|Transcriptome Analysis of Psoriasis in a Large Case–Control Sample: RNA-Seq Provides Insights into Disease Mechanisms.]]" //Journal of Investigative Dermatology//  (2014). (used co-expression network analysis on RNA-seq from 42 samples to study gene regulatory circuits in psoriasis)   - Li, Bingshan, et al. "[[http://www.nature.com/jid/journal/v134/n7/full/jid201428a.html|Transcriptome Analysis of Psoriasis in a Large Case–Control Sample: RNA-Seq Provides Insights into Disease Mechanisms.]]" //Journal of Investigative Dermatology//  (2014). (used co-expression network analysis on RNA-seq from 42 samples to study gene regulatory circuits in psoriasis)
   - [[http://web.stanford.edu/group/wonglab/doc/RNA-seq-talk-JSM2010.pdf|Analysis]] of RNA‐Seq Data, an introduction by Wong, 2010.   - [[http://web.stanford.edu/group/wonglab/doc/RNA-seq-talk-JSM2010.pdf|Analysis]] of RNA‐Seq Data, an introduction by Wong, 2010.
   - Ellis, Byron, and Wing Hung Wong. "Learning **causal**  Bayesian network structures from experimental [[http://web.stanford.edu/group/wonglab/doc/EllisWong-061025.pdf|data]]." //Journal of the American Statistical Association//  103.482 (2008): 778-789.   - Ellis, Byron, and Wing Hung Wong. "Learning **causal**  Bayesian network structures from experimental [[http://web.stanford.edu/group/wonglab/doc/EllisWong-061025.pdf|data]]." //Journal of the American Statistical Association//  103.482 (2008): 778-789.
-  - Barretina et al. "The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity." [[http://www.nature.com/nature/journal/v483/n7391/full/nature11003.html|Nature]] 483.7391 (2012): 603-607 [{{:barretina2012.pdf|pdf}} ]. \\  It provides CCLE, a valuable dataset produced by **Novartis**; mRNA expression data, responses of 24 compounds, and targeted sequencing on ~500 cell lines.+  - Barretina et al. "The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity." [[http://www.nature.com/nature/journal/v483/n7391/full/nature11003.html|Nature]] 483.7391 (2012): 603-607 [{{:barretina2012.pdf|pdf}}  ]. \\  It provides CCLE, a valuable dataset produced by **Novartis**; mRNA expression data, responses of 24 compounds, and targeted sequencing on ~500 cell lines.
   - Friedman, Nir, et al. "Using Bayesian networks to analyze expression data."//Journal of computational [[http://www.cs.huji.ac.il/~nir/Papers/FLNP1Full.pdf|biology]]//7.3-4 (2000): 601-620. \\ Has a good introduction to Bayesian networks and learning causal patterns for beginners.   - Friedman, Nir, et al. "Using Bayesian networks to analyze expression data."//Journal of computational [[http://www.cs.huji.ac.il/~nir/Papers/FLNP1Full.pdf|biology]]//7.3-4 (2000): 601-620. \\ Has a good introduction to Bayesian networks and learning causal patterns for beginners.
   - Al-Lazikani, Bissan, Udai Banerji, and Paul Workman. "Combinatorial drug therapy for cancer in the post-genomic era." //[[http://www.nature.com/nbt/journal/v30/n7/full/nbt.2284.html|Nature]] biotechnology//30.7 (2012): 679-692. \\ "Combinatorial targeted therapy", a good **survey**  including computational methods, successful stories like "identification of synergies between MET and EGFR inhibitors…"   - Al-Lazikani, Bissan, Udai Banerji, and Paul Workman. "Combinatorial drug therapy for cancer in the post-genomic era." //[[http://www.nature.com/nbt/journal/v30/n7/full/nbt.2284.html|Nature]] biotechnology//30.7 (2012): 679-692. \\ "Combinatorial targeted therapy", a good **survey**  including computational methods, successful stories like "identification of synergies between MET and EGFR inhibitors…"
Line 80: Line 88:
   - Bansal, Mukesh, et al. "How to infer gene networks from expression profiles."//Molecular systems biology//  3.1 (2007). \\ A relatively old survey which supports Banjo.   - Bansal, Mukesh, et al. "How to infer gene networks from expression profiles."//Molecular systems biology//  3.1 (2007). \\ A relatively old survey which supports Banjo.
   - Mourad, Raphaël, and Christine Sinoquet, eds. Probabilistic Graphical Models for Genetics, Genomics and Postgenomics. [[http://books.google.com/books?hl=en&lr=&id=2URhBAAAQBAJ&oi=fnd&pg=PP1&dq=Probabilistic Graphical Models for Genetics, Genomics, and Postgenomics&ots=8Yweyc5bKz&sig=xhaDE9f-JXHNckj-lgwvSrgSl3g#v=onepage&q=Probabilistic Graphical Models for Genetics, Genomics, and Postgenomics&f=false|Oxford]] University Press, 2014. \\  An excellent recent, relevant book ( e.g. pages 24,123,154,223).P155: For gene networks, maximum number of parents is commonly set to 3.   - Mourad, Raphaël, and Christine Sinoquet, eds. Probabilistic Graphical Models for Genetics, Genomics and Postgenomics. [[http://books.google.com/books?hl=en&lr=&id=2URhBAAAQBAJ&oi=fnd&pg=PP1&dq=Probabilistic Graphical Models for Genetics, Genomics, and Postgenomics&ots=8Yweyc5bKz&sig=xhaDE9f-JXHNckj-lgwvSrgSl3g#v=onepage&q=Probabilistic Graphical Models for Genetics, Genomics, and Postgenomics&f=false|Oxford]] University Press, 2014. \\  An excellent recent, relevant book ( e.g. pages 24,123,154,223).P155: For gene networks, maximum number of parents is commonly set to 3.
-  - de la Fuente, Alberto. "Gene Network Inference.", [[http://www.springer.com/new & forthcoming titles (default)/book/978-3-642-45160-7|Springer]], 2013 [{{:fuente2013.pdf|pdf}} ]. \\  Figure 1 explains why a network-based approach can be superior to black box machine learning techniques. Literature review on advantages of system approaches over studying single genes. Limited the maximum number of parents per gene to 5. [[http://www.mountsinai.org/profiles/jun-zhu|Zhu]] et al. explains [[http://icahn.mssm.edu/departments-and-institutes/genomics/about/software/rimbanet|RIMBANet]] developed by himself. "Bayesian networks offer the best performance."+  - de la Fuente, Alberto. "Gene Network Inference.", [[http://www.springer.com/new & forthcoming titles (default)/book/978-3-642-45160-7|Springer]], 2013 [{{:fuente2013.pdf|pdf}}  ]. \\  Figure 1 explains why a network-based approach can be superior to black box machine learning techniques. Literature review on advantages of system approaches over studying single genes. Limited the maximum number of parents per gene to 5. [[http://www.mountsinai.org/profiles/jun-zhu|Zhu]] et al. explains [[http://icahn.mssm.edu/departments-and-institutes/genomics/about/software/rimbanet|RIMBANet]] developed by himself. "Bayesian networks offer the best performance."
   - Bayesian Networks with R (bnlearn) and Hadoop, 2014, a good [[http://www.slideshare.net/ofermend/bayesian-networks-with-r-and-hadoop|talk]] by \\  Ofer Mendelevitch with introduction to BNs. Discusses large networks too.   - Bayesian Networks with R (bnlearn) and Hadoop, 2014, a good [[http://www.slideshare.net/ofermend/bayesian-networks-with-r-and-hadoop|talk]] by \\  Ofer Mendelevitch with introduction to BNs. Discusses large networks too.
   - Network Analysis Workshop, Systems Biology Analysis Methods for Genomic, 2013, [[http://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/WORKSHOP/2013/|UCLA]]. Talk in Bayesian networks by Jun Zhu.   - Network Analysis Workshop, Systems Biology Analysis Methods for Genomic, 2013, [[http://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/WORKSHOP/2013/|UCLA]]. Talk in Bayesian networks by Jun Zhu.
   - Vignes, Matthieu, et al. "Gene regulatory network reconstruction using bayesian networks, the dantzig selector, the lasso and their meta-analysis." PloS [[http://www.plosone.org/article/info:doi/10.1371/journal.pone.0029165#pone-0029165-g008|one]] 6.12 (2011): e29165. \\  A meta-analysis to combine these inference methods by computing a consensus ranking scheme, ranked 1st among 16 in a DREAM challenge, but its superiority was not confirmed in [[http://link.springer.com/chapter/10.1007/978-3-642-45161-4_2|Allouche]] 2014. Used greedy hill-climbing of Banjo.   - Vignes, Matthieu, et al. "Gene regulatory network reconstruction using bayesian networks, the dantzig selector, the lasso and their meta-analysis." PloS [[http://www.plosone.org/article/info:doi/10.1371/journal.pone.0029165#pone-0029165-g008|one]] 6.12 (2011): e29165. \\  A meta-analysis to combine these inference methods by computing a consensus ranking scheme, ranked 1st among 16 in a DREAM challenge, but its superiority was not confirmed in [[http://link.springer.com/chapter/10.1007/978-3-642-45161-4_2|Allouche]] 2014. Used greedy hill-climbing of Banjo.
-  - Nagarajan, Radhakrishnan, Marco Scutari, and Sophie Lèbre. Bayesian Networks in R. [[http://link.springer.com/book/10.1007/978-1-4614-6446-4|Springer]], 2013 [{{:bn-r.pdf|pdf}} ]. \\  A starter-to-advanced book by the author of bnlearn R package, defines a way of comparing networks in foreword section.+  - Nagarajan, Radhakrishnan, Marco Scutari, and Sophie Lèbre. Bayesian Networks in R. [[http://link.springer.com/book/10.1007/978-1-4614-6446-4|Springer]], 2013 [{{:bn-r.pdf|pdf}}  ]. \\  A starter-to-advanced book by the author of bnlearn R package, defines a way of comparing networks in foreword section.
   - Schadt, Eric E., et al. "An integrative genomics approach to infer causal associations between gene expression and disease." [[http://www.nature.com/ng/journal/v37/n7/full/ng1589.html|Nature]] genetics 37.7 (2005): 710-717. \\  The original Schadt paper introducing the idea of using genomic data to infer causality in BNs of expression (LCMS).   - Schadt, Eric E., et al. "An integrative genomics approach to infer causal associations between gene expression and disease." [[http://www.nature.com/ng/journal/v37/n7/full/ng1589.html|Nature]] genetics 37.7 (2005): 710-717. \\  The original Schadt paper introducing the idea of using genomic data to infer causality in BNs of expression (LCMS).
   - Jiang, Xia, et al. "Learning genetic epistasis using Bayesian network scoring criteria." BMC [[http://www.biomedcentral.com/1471-2105/12/89/|bioinformatics]] 12.1 (2011): 89. \\  On simulated data, Bayesian scoring (BDeu) outperforms minimum description scores such as AIC.   - Jiang, Xia, et al. "Learning genetic epistasis using Bayesian network scoring criteria." BMC [[http://www.biomedcentral.com/1471-2105/12/89/|bioinformatics]] 12.1 (2011): 89. \\  On simulated data, Bayesian scoring (BDeu) outperforms minimum description scores such as AIC.
Line 91: Line 99:
   - Yu, Jing, et al. "Using Bayesian network inference algorithms to recover molecular genetic regulatory networks." //3rd International [[http://ftp.cs.duke.edu/~amink/publications/manuscripts/hartemink02.icsb.pdf|Conference]] on Systems Biology//. 2002. \\ From Hartemink group, DBN. Very good explanation and comparison of different scores and search strategies. "With large amounts of data, the BIC is a good approximation to the full posterior (BDe) score and is faster to compute; however, it is known to over-penalize with small amounts of data." "The BDe score works better than the BIC score in recovering genetic regulatory pathways." [Compared to simulated annealing and genetic algorithm,] greedy search is better as it can find the top graph in the least amount of time" (note that their network is small). " 3-category discretization was optimal.   - Yu, Jing, et al. "Using Bayesian network inference algorithms to recover molecular genetic regulatory networks." //3rd International [[http://ftp.cs.duke.edu/~amink/publications/manuscripts/hartemink02.icsb.pdf|Conference]] on Systems Biology//. 2002. \\ From Hartemink group, DBN. Very good explanation and comparison of different scores and search strategies. "With large amounts of data, the BIC is a good approximation to the full posterior (BDe) score and is faster to compute; however, it is known to over-penalize with small amounts of data." "The BDe score works better than the BIC score in recovering genetic regulatory pathways." [Compared to simulated annealing and genetic algorithm,] greedy search is better as it can find the top graph in the least amount of time" (note that their network is small). " 3-category discretization was optimal.
   - Zhu, Jun, et al. "Increasing the power to detect causal associations by combining genotypic and expression data in segregating populations." [[http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.0030069|PLoS]] computational biology 3.4 (2007): e69. \\  The methodology used to learn BNs in Zhang et al. 2013 AD paper.   - Zhu, Jun, et al. "Increasing the power to detect causal associations by combining genotypic and expression data in segregating populations." [[http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.0030069|PLoS]] computational biology 3.4 (2007): e69. \\  The methodology used to learn BNs in Zhang et al. 2013 AD paper.
-  - Zhu, J., et al. "An integrative genomics approach to the reconstruction of gene networks in segregating populations." [[http://www.ncbi.nlm.nih.gov/pubmed/15237224|Cytogenetic]] and genome research 105.2-4 (2004): 363-374 [{{:zhu2004.pdf|pdf}} ]. \\  The methodology used to incorporate genetic data to better learn BNs (e.g. useful in Zhang et al. 2013 AD paper). Edges which appeared in more than 30% of 1000 graphs should be used to make the consensus graph.+  - Zhu, J., et al. "An integrative genomics approach to the reconstruction of gene networks in segregating populations." [[http://www.ncbi.nlm.nih.gov/pubmed/15237224|Cytogenetic]] and genome research 105.2-4 (2004): 363-374 [{{:zhu2004.pdf|pdf}}  ]. \\  The methodology used to incorporate genetic data to better learn BNs (e.g. useful in Zhang et al. 2013 AD paper). Edges which appeared in more than 30% of 1000 graphs should be used to make the consensus graph.
   - Steidl, Ulrich G., and Constantine S. Mitsiades. "Therapeutic and diagnostic target gene in acute myeloid leukemia." U.S. [[http://www.google.com/patents/US20140187604|Patent]] Application 14/113,405. \\  Useful for sanity check to see if the genes we identify are known to be important.   - Steidl, Ulrich G., and Constantine S. Mitsiades. "Therapeutic and diagnostic target gene in acute myeloid leukemia." U.S. [[http://www.google.com/patents/US20140187604|Patent]] Application 14/113,405. \\  Useful for sanity check to see if the genes we identify are known to be important.
   - Kommadath, Arun, et al. "Gene co-expression network analysis identifies porcine genes associated with variation in Salmonella shedding." //BMC [[http://www.biomedcentral.com/1471-2164/15/452|genomics]]//15.1 (2014): 452. \\ A recent study that used WGCNA on RNA-seq.   - Kommadath, Arun, et al. "Gene co-expression network analysis identifies porcine genes associated with variation in Salmonella shedding." //BMC [[http://www.biomedcentral.com/1471-2164/15/452|genomics]]//15.1 (2014): 452. \\ A recent study that used WGCNA on RNA-seq.
Line 111: Line 119:
   - Gillis, Jesse, and Paul Pavlidis. "“Guilt by association” is the exception rather than the rule in gene networks." [[http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002444|PLoS]] computational biology 8.3 (2012): e1002444. \\  Discussed the difficulties of computational network analysis. "…Functional information within gene networks is typically concentrated in only a very few interactions whose properties cannot be reliably related to the rest of the network".   - Gillis, Jesse, and Paul Pavlidis. "“Guilt by association” is the exception rather than the rule in gene networks." [[http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002444|PLoS]] computational biology 8.3 (2012): e1002444. \\  Discussed the difficulties of computational network analysis. "…Functional information within gene networks is typically concentrated in only a very few interactions whose properties cannot be reliably related to the rest of the network".
   - Koski, Timo JT, and John Noble. "A review of bayesian networks and structure learning." [[http://www-users.mat.umk.pl/~wniem/SemMgr/BayesNets/Bayesnetsreview.pdf|Mathematica]] Applicanda 40.1 (2012): 51-103. \\  A review from mathematical view point including applications of algebraic geometry to Bayesian networks!   - Koski, Timo JT, and John Noble. "A review of bayesian networks and structure learning." [[http://www-users.mat.umk.pl/~wniem/SemMgr/BayesNets/Bayesnetsreview.pdf|Mathematica]] Applicanda 40.1 (2012): 51-103. \\  A review from mathematical view point including applications of algebraic geometry to Bayesian networks!
-  - Barabási, Albert-László, Natali Gulbahce, and Joseph Loscalzo. "Network medicine: a network-based approach to human disease. ({{:network_medicine_barabasi_1.pdf|pdf}} )" [[http://www.ncbi.nlm.nih.gov/pubmed/21164525|//Nature Reviews Genetics// ]]12.1 (2011): 56-68.+  - Barabási, Albert-László, Natali Gulbahce, and Joseph Loscalzo. "Network medicine: a network-based approach to human disease. ({{:network_medicine_barabasi_1.pdf|pdf}}  )" [[http://www.ncbi.nlm.nih.gov/pubmed/21164525|//Nature Reviews Genetics// ]]12.1 (2011): 56-68.
   - Kustra, Rafal, and Adam Zagdanski. "Incorporating gene ontology in clustering gene expression data." //Computer-Based Medical Systems, 2006. [[http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1647629&tag=1|CBMS]] 2006. 19th IEEE International Symposium on//. IEEE, 2006.   - Kustra, Rafal, and Adam Zagdanski. "Incorporating gene ontology in clustering gene expression data." //Computer-Based Medical Systems, 2006. [[http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1647629&tag=1|CBMS]] 2006. 19th IEEE International Symposium on//. IEEE, 2006.
   - Dotan-Cohen, Dikla, Simon Kasif, and Avraham A. Melkman. "Seeing the forest for the trees: using the gene ontology to restructure hierarchical clustering."[[http://www.ncbi.nlm.nih.gov/pubmed/19497934|//Bioinformatics//]] 25.14 (2009): 1789-1795. \\  Semi-supervised clustering.   - Dotan-Cohen, Dikla, Simon Kasif, and Avraham A. Melkman. "Seeing the forest for the trees: using the gene ontology to restructure hierarchical clustering."[[http://www.ncbi.nlm.nih.gov/pubmed/19497934|//Bioinformatics//]] 25.14 (2009): 1789-1795. \\  Semi-supervised clustering.
Line 132: Line 140:
   - Hill, Steven M., et al. "Inferring causal molecular networks: empirical assessment through a community-based effort." //[[http://www.nature.com/nmeth/journal/v13/n4/full/nmeth.3773.html?WT.ec_id=NMETH-201604&spMailingID=51038576&spUserID=MTIyMzczNjc4MDI2S0&spJobID=883886888&spReportId=ODgzODg2ODg4S0|Nature]] methods// (2016). \\ A DREAM challenge. Compared 2000 networks in 32 biological contexts.   - Hill, Steven M., et al. "Inferring causal molecular networks: empirical assessment through a community-based effort." //[[http://www.nature.com/nmeth/journal/v13/n4/full/nmeth.3773.html?WT.ec_id=NMETH-201604&spMailingID=51038576&spUserID=MTIyMzczNjc4MDI2S0&spJobID=883886888&spReportId=ODgzODg2ODg4S0|Nature]] methods// (2016). \\ A DREAM challenge. Compared 2000 networks in 32 biological contexts.
   - Manning, Cerys Sian. Heterogeneity in melanoma and the microenvironment. [[http://discovery.ucl.ac.uk/1381937/1/Thesiswithcorrectionsaccept.pdf|Diss]]. UCL (University College London), 2013. \\  A PhD thesis with a good introduction to melanoma.   - Manning, Cerys Sian. Heterogeneity in melanoma and the microenvironment. [[http://discovery.ucl.ac.uk/1381937/1/Thesiswithcorrectionsaccept.pdf|Diss]]. UCL (University College London), 2013. \\  A PhD thesis with a good introduction to melanoma.
-  - Shannan, Batool, et al. "Heterogeneity in Melanoma." Melanoma. Springer International Publishing, 2016. 1-15 [[[:melanoma_book-2015.pdf?media=melanoma_book-2015.pdf|pdf]]]. \\  A chapter of a comprehensive recent book on melanoma.+  - Shannan, Batool, et al. "Heterogeneity in Melanoma." Melanoma. Springer International Publishing, 2016. 1-15 [[:melanoma_book-2015.pdf?media=melanoma_book-2015.pdf|pdf]]]. \\  A chapter of a comprehensive recent book on melanoma.
   - Jiao, Yinming, Martin Widschwendter, and Andrew E. Teschendorff. "A systems-level integrative framework for genome-wide DNA methylation and gene expression data identifies differential gene expression modules under epigenetic control." Bioinformatics 30.16 (2014): 2360-2366. \\ [[https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/btu316|FEM]] paper.   - Jiao, Yinming, Martin Widschwendter, and Andrew E. Teschendorff. "A systems-level integrative framework for genome-wide DNA methylation and gene expression data identifies differential gene expression modules under epigenetic control." Bioinformatics 30.16 (2014): 2360-2366. \\ [[https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/btu316|FEM]] paper.
-  - Comprehensive and Integrative Genomic Characterization of Hepatocellular Carcinoma, [[http://www.cell.com/cell/abstract/S0092-8674(17)30639-6?innerTabgraphical_S0092867417306396|Cell]], 2017 [{{:ally-copmprehensive_and_integrative_genomic_char_of_hcc-cell-2017.pdf|pdf}} ]. \\  TCGA's HCC data and subtyping using DNA copy number, DNA methylation, mRNA expression, miRNA expression and RPPA (protein expression). Links to the MDACC dataset with 100 HCC samples.+  - Comprehensive and Integrative Genomic Characterization of Hepatocellular Carcinoma, [[http://www.cell.com/cell/abstract/S0092-8674(17)30639-6?innerTabgraphical_S0092867417306396|Cell]], 2017 [{{:ally-copmprehensive_and_integrative_genomic_char_of_hcc-cell-2017.pdf|pdf}}  ]. \\  TCGA's HCC data and subtyping using DNA copy number, DNA methylation, mRNA expression, miRNA expression and RPPA (protein expression). Links to the MDACC dataset with 100 HCC samples.
   - Guillamot, Maria, Luisa Cimmino, and Iannis Aifantis. "The impact of DNA methylation in hematopoietic malignancies." [[http://www.cell.com/trends/cancer/pdf/S2405-8033(15)00089-8.pdf|Trends in cancer]] 2.2 (2016): 70-83. \\  Reviews and references DNA methylation studies and datasets on AML. E.g., [[http://www.sciencedirect.com/science/article/pii/S1535610809004206|Figueroa]] et al. used DNA methylation for classification of 344 AML cases. [[http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1002781|Akalin]] et al. related DNA methylation patterns with mutations in 5 AML cases. "The methylation status of specific genes can predict the future survival of AML patients, suggesting that DNA methylation is a biomarker for clinical outcome" see e.g., Figueroa et al, [[http://www.bloodjournal.org/content/bloodjournal/113/6/1315.full.pdf?sso-checked=true|Jiang]] 2009 (studied MDS to AML progression in 184 cases), and [[http://www.bloodjournal.org/content/115/3/636.long?sso-checked=true|Bullinger]] 2010 (analyzed 92 genomic regions in 182 patients).   - Guillamot, Maria, Luisa Cimmino, and Iannis Aifantis. "The impact of DNA methylation in hematopoietic malignancies." [[http://www.cell.com/trends/cancer/pdf/S2405-8033(15)00089-8.pdf|Trends in cancer]] 2.2 (2016): 70-83. \\  Reviews and references DNA methylation studies and datasets on AML. E.g., [[http://www.sciencedirect.com/science/article/pii/S1535610809004206|Figueroa]] et al. used DNA methylation for classification of 344 AML cases. [[http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1002781|Akalin]] et al. related DNA methylation patterns with mutations in 5 AML cases. "The methylation status of specific genes can predict the future survival of AML patients, suggesting that DNA methylation is a biomarker for clinical outcome" see e.g., Figueroa et al, [[http://www.bloodjournal.org/content/bloodjournal/113/6/1315.full.pdf?sso-checked=true|Jiang]] 2009 (studied MDS to AML progression in 184 cases), and [[http://www.bloodjournal.org/content/115/3/636.long?sso-checked=true|Bullinger]] 2010 (analyzed 92 genomic regions in 182 patients).
 +  - John [[https://www.youtube.com/watch?v=Vyhq7GZFnes|Quackenbush's]] talk entitled: "Using Networks to Understand the Genotype-Phenotype Connection".
 +  - Saelens, Wouter, Robrecht Cannoodt, and Yvan Saeys. "A comprehensive evaluation of module detection methods for gene expression data." [[https://www.nature.com/articles/s41467-018-03424-4|Nature communications]] 9.1 (2018): 1090. \\  "Graph-based, representative-based, and hierarchical clustering all performed equally well, with the clustering method FLAME (Fuzzy clustering by Local Approximation of Memberships), one of the only clustering methods able to detect overlap, slightly outperforming other clustering methods" including WGCNA. Regularity networks that had been inferred using other data, e.g., "binding motifs in active enhancers", were used as gold standard.
 +  - Choobdar, Sarvenaz, et al. "Assessment of network module identification across complex diseases." [[https://www.nature.com/articles/s41592-019-0509-5|Nature Methods]] 16.9 (2019): 843-852. \\  "The popular weighted gene co-expression network analysis (WGCNA) method7 did not perform competitively."
  
- \\ **Related software**+ 
 +===== Related software =====
  
   - Weighted Gene Co-expression Network Analysis ([[http://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/|WGCNA]]) developed at UCLA. The page has links to some good introductory workshops.   - Weighted Gene Co-expression Network Analysis ([[http://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/|WGCNA]]) developed at UCLA. The page has links to some good introductory workshops.
Line 154: Line 166:
   - See the [[:comparison_of_bayesian_network_learners|table]] for comparison of methods for learning BNs related to our gene network project.   - See the [[:comparison_of_bayesian_network_learners|table]] for comparison of methods for learning BNs related to our gene network project.
  
- \\  \\ [[:gene_networks_inference|Drafts]], [[:gene_networks_inference|Next steps]] + \\  \\ [[:drafts|Drafts]], [[:next_steps|Next steps]]