Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
gene_networks_inference [2018/12/14 15:28] – [Related work] admingene_networks_inference [2020/05/27 03:00] (current) – [iNETgrate] admin
Line 1: Line 1:
 ====== Gene networks inference ====== ====== Gene networks inference ======
 +
 +
 ===== Objectives ===== ===== Objectives =====
 Biological processes in a cell often require intricate coordination between //multiple// genes and proteins. The goal of this project is to infer useful biological and clinical information from large networks of thousands of genes. We develop an integrative approach to analyze co-expression and DNA methylation patterns in a single model. The results will be useful in pinpointing the cause and mechanism of complex diseases such as cancer. Our findings can potentially open doors to new targets for novel treatment plans. \\ Biological processes in a cell often require intricate coordination between //multiple// genes and proteins. The goal of this project is to infer useful biological and clinical information from large networks of thousands of genes. We develop an integrative approach to analyze co-expression and DNA methylation patterns in a single model. The results will be useful in pinpointing the cause and mechanism of complex diseases such as cancer. Our findings can potentially open doors to new targets for novel treatment plans. \\
Line 11: Line 13:
 ===== Papers ===== ===== Papers =====
  
-  - A. Zainulabadeen et al., Underexpression of Specific Interferon Genes Is Associated with Poor Prognosis of Melanoma, [[http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0170025|PLoS One]] 2017, 12(1).\\  "Using our recently developed gene network model, we identified biological signatures that confidently predict the prognosis of melanoma. We showed that our predictive model assesses the risk more accurately than the traditional Clark staging method." +  - A. Zainulabadeen et al., Underexpression of Specific Interferon Genes Is Associated with Poor Prognosis of Melanoma, [[http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0170025|PLoS One]] 2017, 12(1). \\  "Using our recently developed gene network model, we identified biological signatures that confidently predict the prognosis of melanoma. We showed that our predictive model assesses the risk more accurately than the traditional Clark staging method." 
-  - Foroushani, Amir, et al. "Large-scale gene network analysis reveals the significance of extracellular matrix pathway and homeobox genes in acute myeloid leukemia: an introduction to the Pigengene package and its applications." [[https://bmcmedgenomics.biomedcentral.com/articles/10.1186/s12920-017-0253-6|BMC medical genomics]] 10.1 (2017): 16.\\ \\  +  - Foroushani, Amir, et al. "Large-scale gene network analysis reveals the significance of extracellular matrix pathway and homeobox genes in acute myeloid leukemia: an introduction to the Pigengene package and its applications." [[https://bmcmedgenomics.biomedcentral.com/articles/10.1186/s12920-017-0253-6|BMC medical genomics]] 10.1 (2017): 16.​​ 
-=====   =====+  - Agrahari, Rupesh, et al. "Applications of Bayesian network models in predicting types of hematological malignancies." [[https://www.nature.com/articles/s41598-018-24758-5|Scientific Reports]] 8.1 (2018): 6951.
  
  
 ===== Software ===== ===== Software =====
  
-**Pigengene:** This R package provides an efficient way to perform network analysis and to infer biological signatures from gene expression profiles. The signatures are independent from the underlying platform, e.g., it can infer the signatures using data from microarray and evaluate them in an independent RNA Seq dataset. It is approved by, and publicly available from, [[https://bioconductor.org/packages/Pigengene|Bioconductor]].+==== Pigengene ====
  
-**iNETgrate:** [[https://bitbucket.org/habilzare/genetwork/src/master/code/iNETgrate/|This]] R package is useful to integrate DNA methylation and gene expression data into //a single //network. This approach leads to identification of more robust gene modules compared to conventional coexpression networks. The package will be publicly available after review and approval by Bioconductor.+This R package provides an efficient way to perform network analysis and to infer biological signatures from gene expression profiles. The signatures are independent from the underlying platform, e.g., it can infer the signatures using data from microarray and evaluate them in an independent RNA Seq dataset. It is approved by, and publicly available from, [[https://bioconductor.org/packages/Pigengene|Bioconductor]]
 + 
 +==== iNETgrate ==== 
 + 
 +This R package is useful to integrate DNA methylation and gene expression data into //a single //network ([[https://bitbucket.org/habilzare/genetwork/src/master/code/Ghazal/iNETgrate/|code]]). This approach leads to identification of more robust gene modules compared to conventional coexpression networks. The package will be publicly available after review and approval by Bioconductor. A [[https://docs.google.com/document/d/17ZlnpNFm2QD58j4AI-GbyW_9TPXT_e5dsGYm_DPV2Fs/edit|checklist]] for package completion tasks.
  
  
Line 43: Line 49:
   - 200 AML cases from TCGA (LAML dataset). Available data types include gene expression , DNA-methylation, CNV, mutation, etc. TCGA data moved to [[https://gdc-portal.nci.nih.gov/|GDC]] but DNA-methylation is not there. Instead, it can be retrieved from GDC Legacy Archive or the original [[https://tcga-data.nci.nih.gov/docs/publications/laml_2012/|paper]].   - 200 AML cases from TCGA (LAML dataset). Available data types include gene expression , DNA-methylation, CNV, mutation, etc. TCGA data moved to [[https://gdc-portal.nci.nih.gov/|GDC]] but DNA-methylation is not there. Instead, it can be retrieved from GDC Legacy Archive or the original [[https://tcga-data.nci.nih.gov/docs/publications/laml_2012/|paper]].
   - German [[https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE37642|AMLCG]] 1999 provides microarray data of 562 AML samples.   - German [[https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE37642|AMLCG]] 1999 provides microarray data of 562 AML samples.
-  - Papaemmanuil, Elli, et al. "Genomic classification and prognosis in acute myeloid leukemia." [[http://www.nejm.org/doi/full/10.1056/NEJMoa1516192#t=article|NEJM]] 374.23 (2016): 2209-2221. \\  The mutations of 111 genes in over **1,500 AML**  cases are reported. The authors used this information to classify cases into groups and showed these groups have different prognosis. I.,e., [[https://www.mskcc.org/sites/default/files/node/2246/documents/discrete-cpe.pdf|concordance]] (probability estimates) improves from 64% using only the European LeukemiaNet criteria to 71%. Using the alternative allele frequency, they estimated the time of occurrence for the driver mutations. The data are available through the links in the corresponding [[http://www.nature.com/ng/journal/v49/n3/full/ng.3756.html|Nature]] paper [[[:ng.3756.pdf?media=ng.3756.pdf|pdf]]]. Information on downloading these data is contained in the readme file found in genetwork:~/proj/genetwork/data/AML/gerstung/readme.txt. In particular, we have access to [[https://www.ebi.ac.uk/ega/studies/EGAS00001000275|EGAS00001000275]] through [[https://ega-archive.org/|EGA]] Archives. See [[:habils_lab_notebook|Habil's]] note on 2017/09/05 for more detail. Any member of Oncinfo Lab who touches (analyzes or views) these data from Sanger Institute must read and abide to the [[:sanger_data_agreement_2017-08-09.pdf?media=sanger_data_agreement_2017-08-09.pdf|agreement]].+  - Papaemmanuil, Elli, et al. "Genomic classification and prognosis in acute myeloid leukemia." [[http://www.nejm.org/doi/full/10.1056/NEJMoa1516192#t=article|NEJM]] 374.23 (2016): 2209-2221. \\  The mutations of 111 genes in over **1,500 AML**  cases are reported. The authors used this information to classify cases into groups and showed these groups have different prognosis. I.,e., [[https://www.mskcc.org/sites/default/files/node/2246/documents/discrete-cpe.pdf|concordance]] (probability estimates) improves from 64% using only the European LeukemiaNet criteria to 71%. Using the alternative allele frequency, they estimated the time of occurrence for the driver mutations. The data are available through the links in the corresponding [[http://www.nature.com/ng/journal/v49/n3/full/ng.3756.html|Nature]] paper [[:ng.3756.pdf?media=ng.3756.pdf|pdf]]]. Information on downloading these data is contained in the readme file found in genetwork:~/proj/genetwork/data/AML/gerstung/readme.txt. In particular, we have access to [[https://www.ebi.ac.uk/ega/studies/EGAS00001000275|EGAS00001000275]] through [[https://ega-archive.org/|EGA]] Archives. See [[:habils_lab_notebook|Habil's]] note on 2017/09/05 for more detail. Any member of Oncinfo Lab who touches (analyzes or views) these data from Sanger Institute must read and abide to the [[:sanger_data_agreement_2017-08-09.pdf?media=sanger_data_agreement_2017-08-09.pdf|agreement]].
   - RNA, DNA methylation, whole genome, etc. data of 960 (pediatric?) AML cases are available from [[https://ocg.cancer.gov/programs/target/acute-myeloid-leukemia|TARGET]] AML study.   - RNA, DNA methylation, whole genome, etc. data of 960 (pediatric?) AML cases are available from [[https://ocg.cancer.gov/programs/target/acute-myeloid-leukemia|TARGET]] AML study.
   - AML-NK gene expression data (RNA-Seq) from three datasets (TCGA, Leucegene, and PMP/BCCA). [[https://docs.google.com/a/princeton.edu/document/d/1tB75BDAoG6-ggkoKzxF_f8anTnaP0lOAZ4MG-wEWCyk/edit?usp=sharing|Full description]].   - AML-NK gene expression data (RNA-Seq) from three datasets (TCGA, Leucegene, and PMP/BCCA). [[https://docs.google.com/a/princeton.edu/document/d/1tB75BDAoG6-ggkoKzxF_f8anTnaP0lOAZ4MG-wEWCyk/edit?usp=sharing|Full description]].
Line 49: Line 55:
   - Genomic Data Commons ([[https://portal.gdc.cancer.gov/repository|GDC]]), which contains TCGA data and more.   - Genomic Data Commons ([[https://portal.gdc.cancer.gov/repository|GDC]]), which contains TCGA data and more.
   - [[https://amp.pharm.mssm.edu/archs4/|ARCHS4]], which was developed at the Icahn School of Medicine at Mount Sinai, and provides tools to download and analyze RNA-Seq data including single-cell gene expression.   - [[https://amp.pharm.mssm.edu/archs4/|ARCHS4]], which was developed at the Icahn School of Medicine at Mount Sinai, and provides tools to download and analyze RNA-Seq data including single-cell gene expression.
 +  - The [[https://www.nature.com/articles/s41586-018-0623-z#Sec38|BEAT]] ALM dataset of ~300 cases including gene expression, survival, ELN17, etc.
 +  - [[https://www.leukemiaatlas.org/adultaml|Leukemia Protein Atlas]]: Expression of hundreds of proteins were measured in bone marrow and PB samples of ~200 AML cases. A good publicly available resource to validate findings based on gene expression assays.
 +  - ~40K [[https://www.cell.com/cell/fulltext/S0092-8674(19)30094-7?_returnURL=https://linkinghub.elsevier.com/retrieve/pii/S0092867419300947?showall=true|single cell]] RNA-Seq data from 40 bone marrow aspirates, including 16 AML patients and \\  5 healthy donors.
  
-=====   ===== 
  
 ===== Related work ===== ===== Related work =====
  
-  - Zhang, B. //et al.//  Integrated systems approach identifies genetic nodes and networks in late-onset Alzheimer's disease. //[[http://www.cell.com/abstract/S0092-8674(13)00387-5|Cell]]//**153**, 707–720 (2013). [{{:zhang2013.pdf|pdf}} ] \\  Methodology is based on [2] plus they train **Bayesian networks**  to infer causal structure based on RNA-seq. MCMC was used to optimize BIC score, and \\ 1/3 of 1000 network models were averaged.[[http://www.alzforum.org/webinars/can-network-analysis-identify-pathological-pathways-alzheimers|This]] webinar is a high level description of the methodology. +  - Zhang, B. //et al.//  Integrated systems approach identifies genetic nodes and networks in late-onset Alzheimer's disease. //[[http://www.cell.com/abstract/S0092-8674(13)00387-5|Cell]]//**153**, 707–720 (2013). [{{:zhang2013.pdf|pdf}}  ] \\  Methodology is based on [2] plus they train **Bayesian networks**  to infer causal structure based on RNA-seq. MCMC was used to optimize BIC score, and \\ 1/3 of 1000 network models were averaged.[[http://www.alzforum.org/webinars/can-network-analysis-identify-pathological-pathways-alzheimers|This]] webinar is a high level description of the methodology. 
-  - Emilsson, V. //et al.//  Genetics of gene expression and its effect on disease.//[[http://www.nature.com/nature/journal/v452/n7186/full/nature06758.html|Nature]]//**452**, 423–428 (2008). [[:http:file_view_emilsson2008.pdf_521380262_emilsson2008.pdf|pdf]], {{:emilsson2008-supp.pdf|Supp}} ] (based on their references [29-31], uses [[http://www.sciencemag.org/content/297/5586/1551.full.pdf|this]] topological overlap measure) +  - Emilsson, V. //et al.//  Genetics of gene expression and its effect on disease.//[[http://www.nature.com/nature/journal/v452/n7186/full/nature06758.html|Nature]]//**452**, 423–428 (2008). [[:http:file_view_emilsson2008.pdf_521380262_emilsson2008.pdf|pdf]], {{:emilsson2008-supp.pdf|Supp}}  ] (based on their references [29-31], uses [[http://www.sciencemag.org/content/297/5586/1551.full.pdf|this]] topological overlap measure) 
-  - Identifying Gene Regulatory Networks from Gene Expression Data, [[http://web.cs.ucdavis.edu/~filkov/papers/chapter.pdf|Handbook]] of Computational Molecular Biology (2005) [{{:filkov_chapter27.pdf|pdf}} ]. \\  A good but old book chapter.+  - Identifying Gene Regulatory Networks from Gene Expression Data, [[http://web.cs.ucdavis.edu/~filkov/papers/chapter.pdf|Handbook]] of Computational Molecular Biology (2005) [{{:filkov_chapter27.pdf|pdf}}  ]. \\  A good but old book chapter.
   - [[http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.0020130|Integrating Genetic and Network Analysis to Characterize Genes Related to Mouse Weight]] (Identified modules in networks, "A pair of genes is said to have high topological overlap if they are both strongly connected to the same group of genes.", [[http://labs.genetics.ucla.edu/horvath/htdocs/CoexpressionNetwork/MouseWeight/|Software]])   - [[http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.0020130|Integrating Genetic and Network Analysis to Characterize Genes Related to Mouse Weight]] (Identified modules in networks, "A pair of genes is said to have high topological overlap if they are both strongly connected to the same group of genes.", [[http://labs.genetics.ucla.edu/horvath/htdocs/CoexpressionNetwork/MouseWeight/|Software]])
   - Gene correlation network analysis [Wikipedia [[http://en.wikipedia.org/wiki/Weighted_correlation_network_analysis|page]]]   - Gene correlation network analysis [Wikipedia [[http://en.wikipedia.org/wiki/Weighted_correlation_network_analysis|page]]]
-  - Friedman, Nir, et al. "Using Bayesian networks to analyze expression data." Journal of computational biology 7.3-4 (2000): 601-620.[{{:friedman2000.pdf|pdf}} ] \\  A relatively old but highly cited, (~2700) original paper.+  - Friedman, Nir, et al. "Using Bayesian networks to analyze expression data." Journal of computational biology 7.3-4 (2000): 601-620.[{{:friedman2000.pdf|pdf}}  ] \\  A relatively old but highly cited, (~2700) original paper.
   - Ruan, Jianhua, Angela K. Dean, and Weixiong Zhang. "A general co-expression network-based approach to gene expression analysis: comparison and applications." //[[http://www.biomedcentral.com/1752-0509/4/8|BMC]] systems biology//4.1 (2010): 8. (Dr. Jianhua Ruan from San Antonino)   - Ruan, Jianhua, Angela K. Dean, and Weixiong Zhang. "A general co-expression network-based approach to gene expression analysis: comparison and applications." //[[http://www.biomedcentral.com/1752-0509/4/8|BMC]] systems biology//4.1 (2010): 8. (Dr. Jianhua Ruan from San Antonino)
   - Nagrecha, Saurabh, Pawan J. Lingras, and Nitesh V. Chawla. "Comparison of gene co-expression networks and bayesian networks." //Intelligent Information and Database [[https://www3.nd.edu/~dial/papers/ACIIDS2013.pdf|Systems]]// . Springer Berlin Heidelberg, 2013. 507-516. \\ Simple description and some relatively old literature review. \\ "Bayesian networks emerge as a more informative tool to determine the causal structure."   - Nagrecha, Saurabh, Pawan J. Lingras, and Nitesh V. Chawla. "Comparison of gene co-expression networks and bayesian networks." //Intelligent Information and Database [[https://www3.nd.edu/~dial/papers/ACIIDS2013.pdf|Systems]]// . Springer Berlin Heidelberg, 2013. 507-516. \\ Simple description and some relatively old literature review. \\ "Bayesian networks emerge as a more informative tool to determine the causal structure."
-  - Systems Biology: The inference of networks from high dimensional genomics data, lecture by Yeung 2011 [{{:yeun2011-systemsbiology.ppt|ppt}} ] (a good introduction to application of Bayesian networks in co-expression network)+  - Systems Biology: The inference of networks from high dimensional genomics data, lecture by Yeung 2011 [{{:yeun2011-systemsbiology.ppt|ppt}}  ] (a good introduction to application of Bayesian networks in co-expression network)
   - Hong, Shengjun, et al. "Canonical correlation analysis for RNA-seq co-expression networks." //Nucleic acids [[http://www.ncbi.nlm.nih.gov/pubmed/23460206|research]]//41.8 (2013): e95-e95. (improved co-expression analysis for RNA-seq)   - Hong, Shengjun, et al. "Canonical correlation analysis for RNA-seq co-expression networks." //Nucleic acids [[http://www.ncbi.nlm.nih.gov/pubmed/23460206|research]]//41.8 (2013): e95-e95. (improved co-expression analysis for RNA-seq)
   - Li, Bingshan, et al. "[[http://www.nature.com/jid/journal/v134/n7/full/jid201428a.html|Transcriptome Analysis of Psoriasis in a Large Case–Control Sample: RNA-Seq Provides Insights into Disease Mechanisms.]]" //Journal of Investigative Dermatology//  (2014). (used co-expression network analysis on RNA-seq from 42 samples to study gene regulatory circuits in psoriasis)   - Li, Bingshan, et al. "[[http://www.nature.com/jid/journal/v134/n7/full/jid201428a.html|Transcriptome Analysis of Psoriasis in a Large Case–Control Sample: RNA-Seq Provides Insights into Disease Mechanisms.]]" //Journal of Investigative Dermatology//  (2014). (used co-expression network analysis on RNA-seq from 42 samples to study gene regulatory circuits in psoriasis)
   - [[http://web.stanford.edu/group/wonglab/doc/RNA-seq-talk-JSM2010.pdf|Analysis]] of RNA‐Seq Data, an introduction by Wong, 2010.   - [[http://web.stanford.edu/group/wonglab/doc/RNA-seq-talk-JSM2010.pdf|Analysis]] of RNA‐Seq Data, an introduction by Wong, 2010.
   - Ellis, Byron, and Wing Hung Wong. "Learning **causal**  Bayesian network structures from experimental [[http://web.stanford.edu/group/wonglab/doc/EllisWong-061025.pdf|data]]." //Journal of the American Statistical Association//  103.482 (2008): 778-789.   - Ellis, Byron, and Wing Hung Wong. "Learning **causal**  Bayesian network structures from experimental [[http://web.stanford.edu/group/wonglab/doc/EllisWong-061025.pdf|data]]." //Journal of the American Statistical Association//  103.482 (2008): 778-789.
-  - Barretina et al. "The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity." [[http://www.nature.com/nature/journal/v483/n7391/full/nature11003.html|Nature]] 483.7391 (2012): 603-607 [{{:barretina2012.pdf|pdf}} ]. \\  It provides CCLE, a valuable dataset produced by **Novartis**; mRNA expression data, responses of 24 compounds, and targeted sequencing on ~500 cell lines.+  - Barretina et al. "The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity." [[http://www.nature.com/nature/journal/v483/n7391/full/nature11003.html|Nature]] 483.7391 (2012): 603-607 [{{:barretina2012.pdf|pdf}}  ]. \\  It provides CCLE, a valuable dataset produced by **Novartis**; mRNA expression data, responses of 24 compounds, and targeted sequencing on ~500 cell lines.
   - Friedman, Nir, et al. "Using Bayesian networks to analyze expression data."//Journal of computational [[http://www.cs.huji.ac.il/~nir/Papers/FLNP1Full.pdf|biology]]//7.3-4 (2000): 601-620. \\ Has a good introduction to Bayesian networks and learning causal patterns for beginners.   - Friedman, Nir, et al. "Using Bayesian networks to analyze expression data."//Journal of computational [[http://www.cs.huji.ac.il/~nir/Papers/FLNP1Full.pdf|biology]]//7.3-4 (2000): 601-620. \\ Has a good introduction to Bayesian networks and learning causal patterns for beginners.
   - Al-Lazikani, Bissan, Udai Banerji, and Paul Workman. "Combinatorial drug therapy for cancer in the post-genomic era." //[[http://www.nature.com/nbt/journal/v30/n7/full/nbt.2284.html|Nature]] biotechnology//30.7 (2012): 679-692. \\ "Combinatorial targeted therapy", a good **survey**  including computational methods, successful stories like "identification of synergies between MET and EGFR inhibitors…"   - Al-Lazikani, Bissan, Udai Banerji, and Paul Workman. "Combinatorial drug therapy for cancer in the post-genomic era." //[[http://www.nature.com/nbt/journal/v30/n7/full/nbt.2284.html|Nature]] biotechnology//30.7 (2012): 679-692. \\ "Combinatorial targeted therapy", a good **survey**  including computational methods, successful stories like "identification of synergies between MET and EGFR inhibitors…"
Line 80: Line 88:
   - Bansal, Mukesh, et al. "How to infer gene networks from expression profiles."//Molecular systems biology//  3.1 (2007). \\ A relatively old survey which supports Banjo.   - Bansal, Mukesh, et al. "How to infer gene networks from expression profiles."//Molecular systems biology//  3.1 (2007). \\ A relatively old survey which supports Banjo.
   - Mourad, Raphaël, and Christine Sinoquet, eds. Probabilistic Graphical Models for Genetics, Genomics and Postgenomics. [[http://books.google.com/books?hl=en&lr=&id=2URhBAAAQBAJ&oi=fnd&pg=PP1&dq=Probabilistic Graphical Models for Genetics, Genomics, and Postgenomics&ots=8Yweyc5bKz&sig=xhaDE9f-JXHNckj-lgwvSrgSl3g#v=onepage&q=Probabilistic Graphical Models for Genetics, Genomics, and Postgenomics&f=false|Oxford]] University Press, 2014. \\  An excellent recent, relevant book ( e.g. pages 24,123,154,223).P155: For gene networks, maximum number of parents is commonly set to 3.   - Mourad, Raphaël, and Christine Sinoquet, eds. Probabilistic Graphical Models for Genetics, Genomics and Postgenomics. [[http://books.google.com/books?hl=en&lr=&id=2URhBAAAQBAJ&oi=fnd&pg=PP1&dq=Probabilistic Graphical Models for Genetics, Genomics, and Postgenomics&ots=8Yweyc5bKz&sig=xhaDE9f-JXHNckj-lgwvSrgSl3g#v=onepage&q=Probabilistic Graphical Models for Genetics, Genomics, and Postgenomics&f=false|Oxford]] University Press, 2014. \\  An excellent recent, relevant book ( e.g. pages 24,123,154,223).P155: For gene networks, maximum number of parents is commonly set to 3.
-  - de la Fuente, Alberto. "Gene Network Inference.", [[http://www.springer.com/new & forthcoming titles (default)/book/978-3-642-45160-7|Springer]], 2013 [{{:fuente2013.pdf|pdf}} ]. \\  Figure 1 explains why a network-based approach can be superior to black box machine learning techniques. Literature review on advantages of system approaches over studying single genes. Limited the maximum number of parents per gene to 5. [[http://www.mountsinai.org/profiles/jun-zhu|Zhu]] et al. explains [[http://icahn.mssm.edu/departments-and-institutes/genomics/about/software/rimbanet|RIMBANet]] developed by himself. "Bayesian networks offer the best performance."+  - de la Fuente, Alberto. "Gene Network Inference.", [[http://www.springer.com/new & forthcoming titles (default)/book/978-3-642-45160-7|Springer]], 2013 [{{:fuente2013.pdf|pdf}}  ]. \\  Figure 1 explains why a network-based approach can be superior to black box machine learning techniques. Literature review on advantages of system approaches over studying single genes. Limited the maximum number of parents per gene to 5. [[http://www.mountsinai.org/profiles/jun-zhu|Zhu]] et al. explains [[http://icahn.mssm.edu/departments-and-institutes/genomics/about/software/rimbanet|RIMBANet]] developed by himself. "Bayesian networks offer the best performance."
   - Bayesian Networks with R (bnlearn) and Hadoop, 2014, a good [[http://www.slideshare.net/ofermend/bayesian-networks-with-r-and-hadoop|talk]] by \\  Ofer Mendelevitch with introduction to BNs. Discusses large networks too.   - Bayesian Networks with R (bnlearn) and Hadoop, 2014, a good [[http://www.slideshare.net/ofermend/bayesian-networks-with-r-and-hadoop|talk]] by \\  Ofer Mendelevitch with introduction to BNs. Discusses large networks too.
   - Network Analysis Workshop, Systems Biology Analysis Methods for Genomic, 2013, [[http://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/WORKSHOP/2013/|UCLA]]. Talk in Bayesian networks by Jun Zhu.   - Network Analysis Workshop, Systems Biology Analysis Methods for Genomic, 2013, [[http://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/WORKSHOP/2013/|UCLA]]. Talk in Bayesian networks by Jun Zhu.
   - Vignes, Matthieu, et al. "Gene regulatory network reconstruction using bayesian networks, the dantzig selector, the lasso and their meta-analysis." PloS [[http://www.plosone.org/article/info:doi/10.1371/journal.pone.0029165#pone-0029165-g008|one]] 6.12 (2011): e29165. \\  A meta-analysis to combine these inference methods by computing a consensus ranking scheme, ranked 1st among 16 in a DREAM challenge, but its superiority was not confirmed in [[http://link.springer.com/chapter/10.1007/978-3-642-45161-4_2|Allouche]] 2014. Used greedy hill-climbing of Banjo.   - Vignes, Matthieu, et al. "Gene regulatory network reconstruction using bayesian networks, the dantzig selector, the lasso and their meta-analysis." PloS [[http://www.plosone.org/article/info:doi/10.1371/journal.pone.0029165#pone-0029165-g008|one]] 6.12 (2011): e29165. \\  A meta-analysis to combine these inference methods by computing a consensus ranking scheme, ranked 1st among 16 in a DREAM challenge, but its superiority was not confirmed in [[http://link.springer.com/chapter/10.1007/978-3-642-45161-4_2|Allouche]] 2014. Used greedy hill-climbing of Banjo.
-  - Nagarajan, Radhakrishnan, Marco Scutari, and Sophie Lèbre. Bayesian Networks in R. [[http://link.springer.com/book/10.1007/978-1-4614-6446-4|Springer]], 2013 [{{:bn-r.pdf|pdf}} ]. \\  A starter-to-advanced book by the author of bnlearn R package, defines a way of comparing networks in foreword section.+  - Nagarajan, Radhakrishnan, Marco Scutari, and Sophie Lèbre. Bayesian Networks in R. [[http://link.springer.com/book/10.1007/978-1-4614-6446-4|Springer]], 2013 [{{:bn-r.pdf|pdf}}  ]. \\  A starter-to-advanced book by the author of bnlearn R package, defines a way of comparing networks in foreword section.
   - Schadt, Eric E., et al. "An integrative genomics approach to infer causal associations between gene expression and disease." [[http://www.nature.com/ng/journal/v37/n7/full/ng1589.html|Nature]] genetics 37.7 (2005): 710-717. \\  The original Schadt paper introducing the idea of using genomic data to infer causality in BNs of expression (LCMS).   - Schadt, Eric E., et al. "An integrative genomics approach to infer causal associations between gene expression and disease." [[http://www.nature.com/ng/journal/v37/n7/full/ng1589.html|Nature]] genetics 37.7 (2005): 710-717. \\  The original Schadt paper introducing the idea of using genomic data to infer causality in BNs of expression (LCMS).
   - Jiang, Xia, et al. "Learning genetic epistasis using Bayesian network scoring criteria." BMC [[http://www.biomedcentral.com/1471-2105/12/89/|bioinformatics]] 12.1 (2011): 89. \\  On simulated data, Bayesian scoring (BDeu) outperforms minimum description scores such as AIC.   - Jiang, Xia, et al. "Learning genetic epistasis using Bayesian network scoring criteria." BMC [[http://www.biomedcentral.com/1471-2105/12/89/|bioinformatics]] 12.1 (2011): 89. \\  On simulated data, Bayesian scoring (BDeu) outperforms minimum description scores such as AIC.
Line 91: Line 99:
   - Yu, Jing, et al. "Using Bayesian network inference algorithms to recover molecular genetic regulatory networks." //3rd International [[http://ftp.cs.duke.edu/~amink/publications/manuscripts/hartemink02.icsb.pdf|Conference]] on Systems Biology//. 2002. \\ From Hartemink group, DBN. Very good explanation and comparison of different scores and search strategies. "With large amounts of data, the BIC is a good approximation to the full posterior (BDe) score and is faster to compute; however, it is known to over-penalize with small amounts of data." "The BDe score works better than the BIC score in recovering genetic regulatory pathways." [Compared to simulated annealing and genetic algorithm,] greedy search is better as it can find the top graph in the least amount of time" (note that their network is small). " 3-category discretization was optimal.   - Yu, Jing, et al. "Using Bayesian network inference algorithms to recover molecular genetic regulatory networks." //3rd International [[http://ftp.cs.duke.edu/~amink/publications/manuscripts/hartemink02.icsb.pdf|Conference]] on Systems Biology//. 2002. \\ From Hartemink group, DBN. Very good explanation and comparison of different scores and search strategies. "With large amounts of data, the BIC is a good approximation to the full posterior (BDe) score and is faster to compute; however, it is known to over-penalize with small amounts of data." "The BDe score works better than the BIC score in recovering genetic regulatory pathways." [Compared to simulated annealing and genetic algorithm,] greedy search is better as it can find the top graph in the least amount of time" (note that their network is small). " 3-category discretization was optimal.
   - Zhu, Jun, et al. "Increasing the power to detect causal associations by combining genotypic and expression data in segregating populations." [[http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.0030069|PLoS]] computational biology 3.4 (2007): e69. \\  The methodology used to learn BNs in Zhang et al. 2013 AD paper.   - Zhu, Jun, et al. "Increasing the power to detect causal associations by combining genotypic and expression data in segregating populations." [[http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.0030069|PLoS]] computational biology 3.4 (2007): e69. \\  The methodology used to learn BNs in Zhang et al. 2013 AD paper.
-  - Zhu, J., et al. "An integrative genomics approach to the reconstruction of gene networks in segregating populations." [[http://www.ncbi.nlm.nih.gov/pubmed/15237224|Cytogenetic]] and genome research 105.2-4 (2004): 363-374 [{{:zhu2004.pdf|pdf}} ]. \\  The methodology used to incorporate genetic data to better learn BNs (e.g. useful in Zhang et al. 2013 AD paper). Edges which appeared in more than 30% of 1000 graphs should be used to make the consensus graph.+  - Zhu, J., et al. "An integrative genomics approach to the reconstruction of gene networks in segregating populations." [[http://www.ncbi.nlm.nih.gov/pubmed/15237224|Cytogenetic]] and genome research 105.2-4 (2004): 363-374 [{{:zhu2004.pdf|pdf}}  ]. \\  The methodology used to incorporate genetic data to better learn BNs (e.g. useful in Zhang et al. 2013 AD paper). Edges which appeared in more than 30% of 1000 graphs should be used to make the consensus graph.
   - Steidl, Ulrich G., and Constantine S. Mitsiades. "Therapeutic and diagnostic target gene in acute myeloid leukemia." U.S. [[http://www.google.com/patents/US20140187604|Patent]] Application 14/113,405. \\  Useful for sanity check to see if the genes we identify are known to be important.   - Steidl, Ulrich G., and Constantine S. Mitsiades. "Therapeutic and diagnostic target gene in acute myeloid leukemia." U.S. [[http://www.google.com/patents/US20140187604|Patent]] Application 14/113,405. \\  Useful for sanity check to see if the genes we identify are known to be important.
   - Kommadath, Arun, et al. "Gene co-expression network analysis identifies porcine genes associated with variation in Salmonella shedding." //BMC [[http://www.biomedcentral.com/1471-2164/15/452|genomics]]//15.1 (2014): 452. \\ A recent study that used WGCNA on RNA-seq.   - Kommadath, Arun, et al. "Gene co-expression network analysis identifies porcine genes associated with variation in Salmonella shedding." //BMC [[http://www.biomedcentral.com/1471-2164/15/452|genomics]]//15.1 (2014): 452. \\ A recent study that used WGCNA on RNA-seq.
Line 111: Line 119:
   - Gillis, Jesse, and Paul Pavlidis. "“Guilt by association” is the exception rather than the rule in gene networks." [[http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002444|PLoS]] computational biology 8.3 (2012): e1002444. \\  Discussed the difficulties of computational network analysis. "…Functional information within gene networks is typically concentrated in only a very few interactions whose properties cannot be reliably related to the rest of the network".   - Gillis, Jesse, and Paul Pavlidis. "“Guilt by association” is the exception rather than the rule in gene networks." [[http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002444|PLoS]] computational biology 8.3 (2012): e1002444. \\  Discussed the difficulties of computational network analysis. "…Functional information within gene networks is typically concentrated in only a very few interactions whose properties cannot be reliably related to the rest of the network".
   - Koski, Timo JT, and John Noble. "A review of bayesian networks and structure learning." [[http://www-users.mat.umk.pl/~wniem/SemMgr/BayesNets/Bayesnetsreview.pdf|Mathematica]] Applicanda 40.1 (2012): 51-103. \\  A review from mathematical view point including applications of algebraic geometry to Bayesian networks!   - Koski, Timo JT, and John Noble. "A review of bayesian networks and structure learning." [[http://www-users.mat.umk.pl/~wniem/SemMgr/BayesNets/Bayesnetsreview.pdf|Mathematica]] Applicanda 40.1 (2012): 51-103. \\  A review from mathematical view point including applications of algebraic geometry to Bayesian networks!
-  - Barabási, Albert-László, Natali Gulbahce, and Joseph Loscalzo. "Network medicine: a network-based approach to human disease. ({{:network_medicine_barabasi_1.pdf|pdf}} )" [[http://www.ncbi.nlm.nih.gov/pubmed/21164525|//Nature Reviews Genetics// ]]12.1 (2011): 56-68.+  - Barabási, Albert-László, Natali Gulbahce, and Joseph Loscalzo. "Network medicine: a network-based approach to human disease. ({{:network_medicine_barabasi_1.pdf|pdf}}  )" [[http://www.ncbi.nlm.nih.gov/pubmed/21164525|//Nature Reviews Genetics// ]]12.1 (2011): 56-68.
   - Kustra, Rafal, and Adam Zagdanski. "Incorporating gene ontology in clustering gene expression data." //Computer-Based Medical Systems, 2006. [[http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1647629&tag=1|CBMS]] 2006. 19th IEEE International Symposium on//. IEEE, 2006.   - Kustra, Rafal, and Adam Zagdanski. "Incorporating gene ontology in clustering gene expression data." //Computer-Based Medical Systems, 2006. [[http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1647629&tag=1|CBMS]] 2006. 19th IEEE International Symposium on//. IEEE, 2006.
   - Dotan-Cohen, Dikla, Simon Kasif, and Avraham A. Melkman. "Seeing the forest for the trees: using the gene ontology to restructure hierarchical clustering."[[http://www.ncbi.nlm.nih.gov/pubmed/19497934|//Bioinformatics//]] 25.14 (2009): 1789-1795. \\  Semi-supervised clustering.   - Dotan-Cohen, Dikla, Simon Kasif, and Avraham A. Melkman. "Seeing the forest for the trees: using the gene ontology to restructure hierarchical clustering."[[http://www.ncbi.nlm.nih.gov/pubmed/19497934|//Bioinformatics//]] 25.14 (2009): 1789-1795. \\  Semi-supervised clustering.
Line 132: Line 140:
   - Hill, Steven M., et al. "Inferring causal molecular networks: empirical assessment through a community-based effort." //[[http://www.nature.com/nmeth/journal/v13/n4/full/nmeth.3773.html?WT.ec_id=NMETH-201604&spMailingID=51038576&spUserID=MTIyMzczNjc4MDI2S0&spJobID=883886888&spReportId=ODgzODg2ODg4S0|Nature]] methods// (2016). \\ A DREAM challenge. Compared 2000 networks in 32 biological contexts.   - Hill, Steven M., et al. "Inferring causal molecular networks: empirical assessment through a community-based effort." //[[http://www.nature.com/nmeth/journal/v13/n4/full/nmeth.3773.html?WT.ec_id=NMETH-201604&spMailingID=51038576&spUserID=MTIyMzczNjc4MDI2S0&spJobID=883886888&spReportId=ODgzODg2ODg4S0|Nature]] methods// (2016). \\ A DREAM challenge. Compared 2000 networks in 32 biological contexts.
   - Manning, Cerys Sian. Heterogeneity in melanoma and the microenvironment. [[http://discovery.ucl.ac.uk/1381937/1/Thesiswithcorrectionsaccept.pdf|Diss]]. UCL (University College London), 2013. \\  A PhD thesis with a good introduction to melanoma.   - Manning, Cerys Sian. Heterogeneity in melanoma and the microenvironment. [[http://discovery.ucl.ac.uk/1381937/1/Thesiswithcorrectionsaccept.pdf|Diss]]. UCL (University College London), 2013. \\  A PhD thesis with a good introduction to melanoma.
-  - Shannan, Batool, et al. "Heterogeneity in Melanoma." Melanoma. Springer International Publishing, 2016. 1-15 [[[:melanoma_book-2015.pdf?media=melanoma_book-2015.pdf|pdf]]]. \\  A chapter of a comprehensive recent book on melanoma.+  - Shannan, Batool, et al. "Heterogeneity in Melanoma." Melanoma. Springer International Publishing, 2016. 1-15 [[:melanoma_book-2015.pdf?media=melanoma_book-2015.pdf|pdf]]]. \\  A chapter of a comprehensive recent book on melanoma.
   - Jiao, Yinming, Martin Widschwendter, and Andrew E. Teschendorff. "A systems-level integrative framework for genome-wide DNA methylation and gene expression data identifies differential gene expression modules under epigenetic control." Bioinformatics 30.16 (2014): 2360-2366. \\ [[https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/btu316|FEM]] paper.   - Jiao, Yinming, Martin Widschwendter, and Andrew E. Teschendorff. "A systems-level integrative framework for genome-wide DNA methylation and gene expression data identifies differential gene expression modules under epigenetic control." Bioinformatics 30.16 (2014): 2360-2366. \\ [[https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/btu316|FEM]] paper.
-  - Comprehensive and Integrative Genomic Characterization of Hepatocellular Carcinoma, [[http://www.cell.com/cell/abstract/S0092-8674(17)30639-6?innerTabgraphical_S0092867417306396|Cell]], 2017 [{{:ally-copmprehensive_and_integrative_genomic_char_of_hcc-cell-2017.pdf|pdf}} ]. \\  TCGA's HCC data and subtyping using DNA copy number, DNA methylation, mRNA expression, miRNA expression and RPPA (protein expression). Links to the MDACC dataset with 100 HCC samples.+  - Comprehensive and Integrative Genomic Characterization of Hepatocellular Carcinoma, [[http://www.cell.com/cell/abstract/S0092-8674(17)30639-6?innerTabgraphical_S0092867417306396|Cell]], 2017 [{{:ally-copmprehensive_and_integrative_genomic_char_of_hcc-cell-2017.pdf|pdf}}  ]. \\  TCGA's HCC data and subtyping using DNA copy number, DNA methylation, mRNA expression, miRNA expression and RPPA (protein expression). Links to the MDACC dataset with 100 HCC samples.
   - Guillamot, Maria, Luisa Cimmino, and Iannis Aifantis. "The impact of DNA methylation in hematopoietic malignancies." [[http://www.cell.com/trends/cancer/pdf/S2405-8033(15)00089-8.pdf|Trends in cancer]] 2.2 (2016): 70-83. \\  Reviews and references DNA methylation studies and datasets on AML. E.g., [[http://www.sciencedirect.com/science/article/pii/S1535610809004206|Figueroa]] et al. used DNA methylation for classification of 344 AML cases. [[http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1002781|Akalin]] et al. related DNA methylation patterns with mutations in 5 AML cases. "The methylation status of specific genes can predict the future survival of AML patients, suggesting that DNA methylation is a biomarker for clinical outcome" see e.g., Figueroa et al, [[http://www.bloodjournal.org/content/bloodjournal/113/6/1315.full.pdf?sso-checked=true|Jiang]] 2009 (studied MDS to AML progression in 184 cases), and [[http://www.bloodjournal.org/content/115/3/636.long?sso-checked=true|Bullinger]] 2010 (analyzed 92 genomic regions in 182 patients).   - Guillamot, Maria, Luisa Cimmino, and Iannis Aifantis. "The impact of DNA methylation in hematopoietic malignancies." [[http://www.cell.com/trends/cancer/pdf/S2405-8033(15)00089-8.pdf|Trends in cancer]] 2.2 (2016): 70-83. \\  Reviews and references DNA methylation studies and datasets on AML. E.g., [[http://www.sciencedirect.com/science/article/pii/S1535610809004206|Figueroa]] et al. used DNA methylation for classification of 344 AML cases. [[http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1002781|Akalin]] et al. related DNA methylation patterns with mutations in 5 AML cases. "The methylation status of specific genes can predict the future survival of AML patients, suggesting that DNA methylation is a biomarker for clinical outcome" see e.g., Figueroa et al, [[http://www.bloodjournal.org/content/bloodjournal/113/6/1315.full.pdf?sso-checked=true|Jiang]] 2009 (studied MDS to AML progression in 184 cases), and [[http://www.bloodjournal.org/content/115/3/636.long?sso-checked=true|Bullinger]] 2010 (analyzed 92 genomic regions in 182 patients).
 +  - John [[https://www.youtube.com/watch?v=Vyhq7GZFnes|Quackenbush's]] talk entitled: "Using Networks to Understand the Genotype-Phenotype Connection".
 +  - Saelens, Wouter, Robrecht Cannoodt, and Yvan Saeys. "A comprehensive evaluation of module detection methods for gene expression data." [[https://www.nature.com/articles/s41467-018-03424-4|Nature communications]] 9.1 (2018): 1090. \\  "Graph-based, representative-based, and hierarchical clustering all performed equally well, with the clustering method FLAME (Fuzzy clustering by Local Approximation of Memberships), one of the only clustering methods able to detect overlap, slightly outperforming other clustering methods" including WGCNA. Regularity networks that had been inferred using other data, e.g., "binding motifs in active enhancers", were used as gold standard.
 +  - Choobdar, Sarvenaz, et al. "Assessment of network module identification across complex diseases." [[https://www.nature.com/articles/s41592-019-0509-5|Nature Methods]] 16.9 (2019): 843-852. \\  "The popular weighted gene co-expression network analysis (WGCNA) method7 did not perform competitively."
  
- \\ **Related software**+ 
 +===== Related software =====
  
   - Weighted Gene Co-expression Network Analysis ([[http://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/|WGCNA]]) developed at UCLA. The page has links to some good introductory workshops.   - Weighted Gene Co-expression Network Analysis ([[http://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/|WGCNA]]) developed at UCLA. The page has links to some good introductory workshops.
Line 154: Line 166:
   - See the [[:comparison_of_bayesian_network_learners|table]] for comparison of methods for learning BNs related to our gene network project.   - See the [[:comparison_of_bayesian_network_learners|table]] for comparison of methods for learning BNs related to our gene network project.
  
- \\  \\ [[:gene_networks_inference|Drafts]], [[:gene_networks_inference|Next steps]] + \\  \\ [[:drafts|Drafts]], [[:next_steps|Next steps]]
- +
- +
-===== Data ===== +
- +
-  - mRNA expression and Mutation data from [[novartis_data|Novartis]].\\  Broad-Novartis Cancer Cell Line Encyclopedia ([[http://www.broadinstitute.org/ccle/home|CCLE]], [[http://www.nature.com/nature/journal/v483/n7391/full/nature11003.html|Barretina]] et al.).About 500 cell lines from different human cancers. The goal here is to predict drug responses in particular in synergistic settings. Specifically, BROWS > DATA >\\  a. mRNA expression > gene-centric RMA-normalized mRNA expression data > the gctx [[http://www.broadinstitute.org/ccle/data/browseData?conversationPropagation=begin|file]]. ([[|How to]] read a gctx file?) \\  b. Pharmacological profiling Drug data > Pharmacologic profiles for 24 anticancer drugs across 504 CCLE lines.\\ c. [[/file/view/CCLE_Chris_clustering.csv/535042602/CCLE_Chris_clustering.csv|Clustering]] and gene [[/file/view/CCLE_Chris_GO.csv/535042614/CCLE_Chris_GO.csv|ontology]] analysis done by Dr. Chris [[https://www.linkedin.com/profile/view?id=145098421&authType=NAME_SEARCH&authToken=xR3c&locale=en_US&srchid=1029939271418481101510&srchindex=1&srchtotal=1&trk=vsrp_people_res_name&trkInfo=VSRPsearchId%3A1029939271418481101510%2CVSRPtargetId%3A145098421%2CVSRPcmpt%3Aprimary|Gaiteri]]. +
-  - RNA-seq from about a hundred [[alys_data|AML]] and MDS cases are available from Karsan lab. The goal here is to identify the general underling mechanisms of the disease, and to compare them with the relapse factors. More specific questions are a) What pathways are different in AML than MDS? b) Are there pathways which can define AML subtypes, which are expected to exist due to differences in prognosis? c) What are the molecular mechanisms of [[http://dsas9a9gxtv2e.cloudfront.net/content/haematol/95/10/1623/F1.large.jpg|transformation]] of some MDS cases to AML? +
-  - "[[http://www.ncbi.nlm.nih.gov/geo/|GEO]], a public functional genomics data repository" ([[https://www.biostars.org/p/97370/|source]]). Includes over 4,000 leukemia subjects in the MILES series, a particular microArray study that contains around 400 AML and 300 MDS cases. +
-  - NCI-60 cell lines ([[http://www.cbioportal.org/public-portal/study.do?cancer_study_id=cellline_nci60|cBio]] portal, [[http://dtp.nci.nih.gov/branches/btb/ivclsp.html|DTB]]). +
-  - Genentech data set [[http://www.nature.com/nbt/journal/vaop/ncurrent/full/nbt.3080.html|published]] in 2014 (Klinj et al.), RNA-seq for 675 cell lines including 15 AMLs, and response to 5 drugs. +
-  - Sanger data set [[http://www.nature.com/nature/journal/v483/n7391/full/nature11005.html|published]] in 2012 (Garnet et al.), similar to CCLE data with 639 cell lines and 130 compounds. +
-  - GlaxoSmithKline data set [[http://cancerres.aacrjournals.org/content/70/9/3677.long|published]] in 2010 (Greshock et al.), similar to CCLE data with 311 cell lines and 19 compounds. +
-  - Nucleic Acids Research online Molecular Biology Database [[http://nar.oxfordjournals.org/content/41/D1/D1.abstract?ijkey=a782763acb573716f2620e420a35a6d3fbaa3cf5&keytype2=tf_ipsecsha|Collection]].\\  lists 1512 miscellaneous online databases. +
-  - RNA-seq data of over 100 Xiphophorus fish treated with light under different conditions such as dosage and wavelength. 20-30 controls are also available from Walter [[http://mbrg.chemistry.txstate.edu/|Lab]] [[[https://docs.google.com/spreadsheets/d/1oG2doOyKcfCidYciPgO6Q50nwtUmflU-Ll8-iRkr_6U/edit#gid=0|table]]]. +
-  - [[http://www.reuters.com/article/2015/09/22/us-astrazeneca-cancer-idUSKCN0RM0MG20150922|AstraZeneca's]] crowd sourcing initiative as part of the DREAM Challenge. ~10,000 tested combinations measuring the ability of drugs to destroy cancer cell lines, and the corresponding genomic information. +
-  - Breast Cancer datasets: We will examine the generalizability of the method that we developed for haematological malignancies (AML/MDS) by examining its performance on several breast cancer datasets: 209 ER+ samples from Wang et al's [[http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=gse2034|dataset ]](GEO). ([[http://www.nature.com/ncomms/journal/v1/n4/full/ncomms1033.html|Paper]]), 201 ER+ samples from Miller et al's [[http://www.ncbi.nlm.nih.gov/geo/geo2r/?acc=GSE3494&platform=GPL96|dataset]] (GEO), as well as expression data from [[https://www.ebi.ac.uk/ega/studies/EGAS00000000083|METABRIC]] study ( ~2000 samples, hosted by EGA) ([[http://www.nature.com/nature/journal/v486/n7403/full/nature10983.html|paper]]). +
-  - Microarray expression profiles of 1005 colorectal cancer patients from 13 independent cohorts ([[http://www.cancer-systemsbiology.org/Papers/JAMA-2015.pdf|paper]]). +
-  - Gene expression [[https://docs.google.com/spreadsheets/d/1oG2doOyKcfCidYciPgO6Q50nwtUmflU-Ll8-iRkr_6U/edit#gid=0|data]] of fish exposed to light (Walter Lab). +
-  - 16 pairs of tumor-normal samples from fish with [[/file/view/count_table_extra_32_samples_cpm.csv/578966565/count_table_extra_32_samples_cpm.csv|melanoma]] (Walter Lab). +
-  - 499 prostate adenocarcinoma ([[https://tcga-data.nci.nih.gov/tcga/tcgaCancerDetails.jsp?diseaseType=PRAD&diseaseName=Prostate%20adenocarcinoma|TCGA]], Provisional) samples. Low risk cases are "Disease Free" for at least 5 years and the "Recurred" ones are high risk. The relevant clinical data are shown in "Disease Free (Months)" and "Disease Free Status" columns in [[http://www.cbioportal.org/study.do?cancer_study_id=prad_tcga#|cBioPortal]], respectively. +
-  - 470 skin [[https://tcga-data.nci.nih.gov/tcga/tcgaCancerDetails.jsp?diseaseType=SKCM&diseaseName=Skin%20Cutaneous%20Melanoma|cutaneous]] melanoma samples from TCGA. The clinical data for survival analysis are shown in "Disease Free (Months)", "Disease Free Status", and "Days to Last Followup" columns. +
-  - 200 AML cases from TCGA (LAML dataset). Available data types include gene expression , DNA-methylation, CNV, mutation, etc. TCGA data moved to [[https://gdc-portal.nci.nih.gov/|GDC]] but DNA-methylation is not there. Instead, it can be retrieved from GDC Legacy [[https://gdc-portal.nci.nih.gov/legacy-archive/search/f?filters=%7B%22op%22:%22and%22,%22content%22:%5B%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:%5B%22TCGA-LAML%22%5D%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:%5B%22DNA%20methylation%22%5D%7D%7D%5D%7D&pagination=%7B%22files%22:%7B%22from%22:0,%22size%22:20,%22sort%22:%22cases.project.project_id:asc%22%7D%7D|Archive]] or the original [[https://tcga-data.nci.nih.gov/docs/publications/laml_2012/|paper]]. +
-  - German [[https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE37642|AMLCG]] 1999 provides microarray data of 562 AML samples. +
-  - Papaemmanuil, Elli, et al. "Genomic classification and prognosis in acute myeloid leukemia." [[http://www.nejm.org/doi/full/10.1056/NEJMoa1516192#t=article|NEJM]] 374.23 (2016): 2209-2221.\\  The mutations of 111 genes in over **1,500 AML** cases are reported. The authors used this information to classify cases into groups and showed these groups have different prognosis. I.,e., [[https://www.mskcc.org/sites/default/files/node/2246/documents/discrete-cpe.pdf|concordance]] (probability estimates) improves from 64% using only the European LeukemiaNet criteria to 71%. Using the alternative allele frequency, they estimated the time of occurrence for the driver mutations. The data are available through the links in the corresponding [[http://www.nature.com/ng/journal/v49/n3/full/ng.3756.html|Nature]] paper [{{ :ng.3756.pdf|pdf}}]. Information on downloading these data is contained in the readme file found in genetwork:~/proj/genetwork/data/AML/gerstung/readme.txt. In particular, we have access to [[https://www.ebi.ac.uk/ega/studies/EGAS00001000275|EGAS00001000275]] through [[https://ega-archive.org/|EGA]] Archives. See [[habils_lab_notebook|Habil's]] note on 2017/09/05 for more detail. Any member of Oncinfo Lab who touches (analyzes or views) these data from Sanger Institute must read and abide to the {{ :sanger_data_agreement_2017-08-09.pdf|agreement}}. +
-  - RNA, DNA methylation, whole genome, etc. data of 960 (pediatric?) AML cases are available from [[https://ocg.cancer.gov/programs/target/acute-myeloid-leukemia|TARGET]] AML study. +
-  - AML-NK gene expression data (RNA-Seq) from three datasets (TCGA, Leucegene, and PMP/BCCA). [[https://docs.google.com/a/princeton.edu/document/d/1tB75BDAoG6-ggkoKzxF_f8anTnaP0lOAZ4MG-wEWCyk/edit?usp=sharing|Full description]]. +
-  - [[https://docs.google.com/document/d/1Q6tuMDw4fweRQNttmiG3NEZiBoe2JcRj7M1Qg-_rCgk/edit|List]] of available AML datasets with DNA methylation or gene expression data. +
-  - Genomic Data Commons ([[https://portal.gdc.cancer.gov/repository|GDC]]), which contains TCGA data and more. +
-  - [[https://amp.pharm.mssm.edu/archs4/|ARCHS4]], which was developed at the Icahn School of Medicine at Mount Sinai, and provides tools to download and analyze RNA-Seq data including single-cell gene expression. +
-=====   ===== +
-=====   =====+