Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
gene_networks_inference [2018/12/14 15:34] – [Related work] admingene_networks_inference [2020/05/27 03:00] (current) – [iNETgrate] admin
Line 1: Line 1:
 ====== Gene networks inference ====== ====== Gene networks inference ======
 +
 +
 ===== Objectives ===== ===== Objectives =====
 Biological processes in a cell often require intricate coordination between //multiple// genes and proteins. The goal of this project is to infer useful biological and clinical information from large networks of thousands of genes. We develop an integrative approach to analyze co-expression and DNA methylation patterns in a single model. The results will be useful in pinpointing the cause and mechanism of complex diseases such as cancer. Our findings can potentially open doors to new targets for novel treatment plans. \\ Biological processes in a cell often require intricate coordination between //multiple// genes and proteins. The goal of this project is to infer useful biological and clinical information from large networks of thousands of genes. We develop an integrative approach to analyze co-expression and DNA methylation patterns in a single model. The results will be useful in pinpointing the cause and mechanism of complex diseases such as cancer. Our findings can potentially open doors to new targets for novel treatment plans. \\
Line 18: Line 20:
 ===== Software ===== ===== Software =====
  
-**Pigengene:** This R package provides an efficient way to perform network analysis and to infer biological signatures from gene expression profiles. The signatures are independent from the underlying platform, e.g., it can infer the signatures using data from microarray and evaluate them in an independent RNA Seq dataset. It is approved by, and publicly available from, [[https://bioconductor.org/packages/Pigengene|Bioconductor]].+==== Pigengene ====
  
-**iNETgrate:** [[https://bitbucket.org/habilzare/genetwork/src/master/code/iNETgrate/|This]] R package is useful to integrate DNA methylation and gene expression data into //a single //network. This approach leads to identification of more robust gene modules compared to conventional coexpression networks. The package will be publicly available after review and approval by Bioconductor.+This R package provides an efficient way to perform network analysis and to infer biological signatures from gene expression profiles. The signatures are independent from the underlying platform, e.g., it can infer the signatures using data from microarray and evaluate them in an independent RNA Seq dataset. It is approved by, and publicly available from, [[https://bioconductor.org/packages/Pigengene|Bioconductor]]
 + 
 +==== iNETgrate ==== 
 + 
 +This R package is useful to integrate DNA methylation and gene expression data into //a single //network ([[https://bitbucket.org/habilzare/genetwork/src/master/code/Ghazal/iNETgrate/|code]]). This approach leads to identification of more robust gene modules compared to conventional coexpression networks. The package will be publicly available after review and approval by Bioconductor. A [[https://docs.google.com/document/d/17ZlnpNFm2QD58j4AI-GbyW_9TPXT_e5dsGYm_DPV2Fs/edit|checklist]] for package completion tasks.
  
  
Line 43: Line 49:
   - 200 AML cases from TCGA (LAML dataset). Available data types include gene expression , DNA-methylation, CNV, mutation, etc. TCGA data moved to [[https://gdc-portal.nci.nih.gov/|GDC]] but DNA-methylation is not there. Instead, it can be retrieved from GDC Legacy Archive or the original [[https://tcga-data.nci.nih.gov/docs/publications/laml_2012/|paper]].   - 200 AML cases from TCGA (LAML dataset). Available data types include gene expression , DNA-methylation, CNV, mutation, etc. TCGA data moved to [[https://gdc-portal.nci.nih.gov/|GDC]] but DNA-methylation is not there. Instead, it can be retrieved from GDC Legacy Archive or the original [[https://tcga-data.nci.nih.gov/docs/publications/laml_2012/|paper]].
   - German [[https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE37642|AMLCG]] 1999 provides microarray data of 562 AML samples.   - German [[https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE37642|AMLCG]] 1999 provides microarray data of 562 AML samples.
-  - Papaemmanuil, Elli, et al. "Genomic classification and prognosis in acute myeloid leukemia." [[http://www.nejm.org/doi/full/10.1056/NEJMoa1516192#t=article|NEJM]] 374.23 (2016): 2209-2221. \\  The mutations of 111 genes in over **1,500 AML**  cases are reported. The authors used this information to classify cases into groups and showed these groups have different prognosis. I.,e., [[https://www.mskcc.org/sites/default/files/node/2246/documents/discrete-cpe.pdf|concordance]] (probability estimates) improves from 64% using only the European LeukemiaNet criteria to 71%. Using the alternative allele frequency, they estimated the time of occurrence for the driver mutations. The data are available through the links in the corresponding [[http://www.nature.com/ng/journal/v49/n3/full/ng.3756.html|Nature]] paper [[[:ng.3756.pdf?media=ng.3756.pdf|pdf]]]. Information on downloading these data is contained in the readme file found in genetwork:~/proj/genetwork/data/AML/gerstung/readme.txt. In particular, we have access to [[https://www.ebi.ac.uk/ega/studies/EGAS00001000275|EGAS00001000275]] through [[https://ega-archive.org/|EGA]] Archives. See [[:habils_lab_notebook|Habil's]] note on 2017/09/05 for more detail. Any member of Oncinfo Lab who touches (analyzes or views) these data from Sanger Institute must read and abide to the [[:sanger_data_agreement_2017-08-09.pdf?media=sanger_data_agreement_2017-08-09.pdf|agreement]].+  - Papaemmanuil, Elli, et al. "Genomic classification and prognosis in acute myeloid leukemia." [[http://www.nejm.org/doi/full/10.1056/NEJMoa1516192#t=article|NEJM]] 374.23 (2016): 2209-2221. \\  The mutations of 111 genes in over **1,500 AML**  cases are reported. The authors used this information to classify cases into groups and showed these groups have different prognosis. I.,e., [[https://www.mskcc.org/sites/default/files/node/2246/documents/discrete-cpe.pdf|concordance]] (probability estimates) improves from 64% using only the European LeukemiaNet criteria to 71%. Using the alternative allele frequency, they estimated the time of occurrence for the driver mutations. The data are available through the links in the corresponding [[http://www.nature.com/ng/journal/v49/n3/full/ng.3756.html|Nature]] paper [[:ng.3756.pdf?media=ng.3756.pdf|pdf]]]. Information on downloading these data is contained in the readme file found in genetwork:~/proj/genetwork/data/AML/gerstung/readme.txt. In particular, we have access to [[https://www.ebi.ac.uk/ega/studies/EGAS00001000275|EGAS00001000275]] through [[https://ega-archive.org/|EGA]] Archives. See [[:habils_lab_notebook|Habil's]] note on 2017/09/05 for more detail. Any member of Oncinfo Lab who touches (analyzes or views) these data from Sanger Institute must read and abide to the [[:sanger_data_agreement_2017-08-09.pdf?media=sanger_data_agreement_2017-08-09.pdf|agreement]].
   - RNA, DNA methylation, whole genome, etc. data of 960 (pediatric?) AML cases are available from [[https://ocg.cancer.gov/programs/target/acute-myeloid-leukemia|TARGET]] AML study.   - RNA, DNA methylation, whole genome, etc. data of 960 (pediatric?) AML cases are available from [[https://ocg.cancer.gov/programs/target/acute-myeloid-leukemia|TARGET]] AML study.
   - AML-NK gene expression data (RNA-Seq) from three datasets (TCGA, Leucegene, and PMP/BCCA). [[https://docs.google.com/a/princeton.edu/document/d/1tB75BDAoG6-ggkoKzxF_f8anTnaP0lOAZ4MG-wEWCyk/edit?usp=sharing|Full description]].   - AML-NK gene expression data (RNA-Seq) from three datasets (TCGA, Leucegene, and PMP/BCCA). [[https://docs.google.com/a/princeton.edu/document/d/1tB75BDAoG6-ggkoKzxF_f8anTnaP0lOAZ4MG-wEWCyk/edit?usp=sharing|Full description]].
Line 49: Line 55:
   - Genomic Data Commons ([[https://portal.gdc.cancer.gov/repository|GDC]]), which contains TCGA data and more.   - Genomic Data Commons ([[https://portal.gdc.cancer.gov/repository|GDC]]), which contains TCGA data and more.
   - [[https://amp.pharm.mssm.edu/archs4/|ARCHS4]], which was developed at the Icahn School of Medicine at Mount Sinai, and provides tools to download and analyze RNA-Seq data including single-cell gene expression.   - [[https://amp.pharm.mssm.edu/archs4/|ARCHS4]], which was developed at the Icahn School of Medicine at Mount Sinai, and provides tools to download and analyze RNA-Seq data including single-cell gene expression.
 +  - The [[https://www.nature.com/articles/s41586-018-0623-z#Sec38|BEAT]] ALM dataset of ~300 cases including gene expression, survival, ELN17, etc.
 +  - [[https://www.leukemiaatlas.org/adultaml|Leukemia Protein Atlas]]: Expression of hundreds of proteins were measured in bone marrow and PB samples of ~200 AML cases. A good publicly available resource to validate findings based on gene expression assays.
 +  - ~40K [[https://www.cell.com/cell/fulltext/S0092-8674(19)30094-7?_returnURL=https://linkinghub.elsevier.com/retrieve/pii/S0092867419300947?showall=true|single cell]] RNA-Seq data from 40 bone marrow aspirates, including 16 AML patients and \\  5 healthy donors.
  
-=====   ===== 
  
 ===== Related work ===== ===== Related work =====
Line 136: Line 144:
   - Comprehensive and Integrative Genomic Characterization of Hepatocellular Carcinoma, [[http://www.cell.com/cell/abstract/S0092-8674(17)30639-6?innerTabgraphical_S0092867417306396|Cell]], 2017 [{{:ally-copmprehensive_and_integrative_genomic_char_of_hcc-cell-2017.pdf|pdf}}  ]. \\  TCGA's HCC data and subtyping using DNA copy number, DNA methylation, mRNA expression, miRNA expression and RPPA (protein expression). Links to the MDACC dataset with 100 HCC samples.   - Comprehensive and Integrative Genomic Characterization of Hepatocellular Carcinoma, [[http://www.cell.com/cell/abstract/S0092-8674(17)30639-6?innerTabgraphical_S0092867417306396|Cell]], 2017 [{{:ally-copmprehensive_and_integrative_genomic_char_of_hcc-cell-2017.pdf|pdf}}  ]. \\  TCGA's HCC data and subtyping using DNA copy number, DNA methylation, mRNA expression, miRNA expression and RPPA (protein expression). Links to the MDACC dataset with 100 HCC samples.
   - Guillamot, Maria, Luisa Cimmino, and Iannis Aifantis. "The impact of DNA methylation in hematopoietic malignancies." [[http://www.cell.com/trends/cancer/pdf/S2405-8033(15)00089-8.pdf|Trends in cancer]] 2.2 (2016): 70-83. \\  Reviews and references DNA methylation studies and datasets on AML. E.g., [[http://www.sciencedirect.com/science/article/pii/S1535610809004206|Figueroa]] et al. used DNA methylation for classification of 344 AML cases. [[http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1002781|Akalin]] et al. related DNA methylation patterns with mutations in 5 AML cases. "The methylation status of specific genes can predict the future survival of AML patients, suggesting that DNA methylation is a biomarker for clinical outcome" see e.g., Figueroa et al, [[http://www.bloodjournal.org/content/bloodjournal/113/6/1315.full.pdf?sso-checked=true|Jiang]] 2009 (studied MDS to AML progression in 184 cases), and [[http://www.bloodjournal.org/content/115/3/636.long?sso-checked=true|Bullinger]] 2010 (analyzed 92 genomic regions in 182 patients).   - Guillamot, Maria, Luisa Cimmino, and Iannis Aifantis. "The impact of DNA methylation in hematopoietic malignancies." [[http://www.cell.com/trends/cancer/pdf/S2405-8033(15)00089-8.pdf|Trends in cancer]] 2.2 (2016): 70-83. \\  Reviews and references DNA methylation studies and datasets on AML. E.g., [[http://www.sciencedirect.com/science/article/pii/S1535610809004206|Figueroa]] et al. used DNA methylation for classification of 344 AML cases. [[http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1002781|Akalin]] et al. related DNA methylation patterns with mutations in 5 AML cases. "The methylation status of specific genes can predict the future survival of AML patients, suggesting that DNA methylation is a biomarker for clinical outcome" see e.g., Figueroa et al, [[http://www.bloodjournal.org/content/bloodjournal/113/6/1315.full.pdf?sso-checked=true|Jiang]] 2009 (studied MDS to AML progression in 184 cases), and [[http://www.bloodjournal.org/content/115/3/636.long?sso-checked=true|Bullinger]] 2010 (analyzed 92 genomic regions in 182 patients).
 +  - John [[https://www.youtube.com/watch?v=Vyhq7GZFnes|Quackenbush's]] talk entitled: "Using Networks to Understand the Genotype-Phenotype Connection".
 +  - Saelens, Wouter, Robrecht Cannoodt, and Yvan Saeys. "A comprehensive evaluation of module detection methods for gene expression data." [[https://www.nature.com/articles/s41467-018-03424-4|Nature communications]] 9.1 (2018): 1090. \\  "Graph-based, representative-based, and hierarchical clustering all performed equally well, with the clustering method FLAME (Fuzzy clustering by Local Approximation of Memberships), one of the only clustering methods able to detect overlap, slightly outperforming other clustering methods" including WGCNA. Regularity networks that had been inferred using other data, e.g., "binding motifs in active enhancers", were used as gold standard.
 +  - Choobdar, Sarvenaz, et al. "Assessment of network module identification across complex diseases." [[https://www.nature.com/articles/s41592-019-0509-5|Nature Methods]] 16.9 (2019): 843-852. \\  "The popular weighted gene co-expression network analysis (WGCNA) method7 did not perform competitively."
 +
  
- \\ **Related software**+===== Related software =====
  
   - Weighted Gene Co-expression Network Analysis ([[http://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/|WGCNA]]) developed at UCLA. The page has links to some good introductory workshops.   - Weighted Gene Co-expression Network Analysis ([[http://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/|WGCNA]]) developed at UCLA. The page has links to some good introductory workshops.