Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
hepatocellular_carcinoma [2019/05/20 04:16] – [Project 1:] adminhepatocellular_carcinoma [2020/12/16 00:03] (current) – [Project 3:] admin
Line 1: Line 1:
 ====== Hepatocellular carcinoma ====== ====== Hepatocellular carcinoma ======
 +
 ===== Objectives ===== ===== Objectives =====
-\\ Hepatocellular carcinoma (HCC) is a the most common type of liver cancer and the third leading cause of cancer deaths in the world. The goal of project one is to identify the genes that are associated with regressing tumors (regardless of type of treatment) vs. those that are growing from the C3HeB/FeJ mouse model. Normal liver is used as a control. The results will be useful in identifying new therapeutic targets and potential drug combinations which could lead to more efficient treatments. Project two is related to knockdown of a specific protein that results in HCC in a different mouse model. The goal in the second project is to identify the genes and pathways that correlate with this protein.\\ \\  
-===== Data ===== 
  
-  - RNAseq from liver of 9 treated and 4 control samples ([[christis_data|Christi's data]]). +\\ 
-  - The closest reference genome to our mouse strain is [[http://www.csbio.unc.edu/CCstatus/index.py?run=Pseudo|C3H/HeJ]]. We can use fasta and MOD files from build 37 (mm9)which is more [[https://www.biostars.org/p/81602/|annotated]] than build 38 (mm10). +Hepatocellular carcinoma (HCC) is a the most common type of liver cancer and the third leading cause of cancer deaths in the world. [[https://www.merckmanuals.com/home/liver-and-gallbladder-disorders/fibrosis-and-cirrhosis-of-the-liver/cirrhosis-of-the-liver#v28485447|Cirrhosis]] of the liver is a major risk and contributing factor for HCC. The goal of project one is to identify the genes that are associated with regressing tumors (regardless of type of treatmentvs. those that are growing from the C3HeB/FeJ mouse model. Normal liver is used as a control. The results will be useful in identifying new therapeutic targets and potential drug combinations which could lead to more efficient treatments. Project two is related to knockdown of a specific protein that results in HCC in a different mouse model. The goal in the second project is to identify the genes and pathways that correlate with this protein. 
-  Alternatively, we can map to the mouse reference transcriptome ([[|NCBI37]]/mm9, rna.fa), and simplify the analysis in expense of losing upto 7% of reads+ 
-  - Ron Walter's lab ran their pipeline to filter the fastq data. These files are stored in folder called Filtered_fastq_files. From Will Boswell"PE stands for paired end readsFor example, you have a 500bp fragment and your target sequence size is 125bpThe fragment will be sequenced 125 bases from one end and 125 bases from the other end, and Illumina refers to this as paired end readsSE stands for single end reads, which in our case is generated during our filtering processIf you look at the pre-filtered reads, you’ll see only PE1 and PE2 for each sampleDuring filtration, if one of the PE’s have low quality, it is tossed out leaving the other PE, and since it no longer has a mate pair, it’s kept as a single end sequence. Also, there are several files in the post-filtered directories that are considered intermediate files in the filtering process that we don’t need; these are process files used by the filtering scriptThe only files you should be concerned with are the _pe1.r.fastq, _pe2.r.fastq, _se.r.fastq, and _PE.filter.stats (gives you the number of reads mapped to the genome for each PE and SE)." A summary of the analysis can be found {{ :mouse_hcc_liver_sequencing_summary.docx|here}}+ 
-  - Sequencing was completed by Beckman Coulter using [[http://www.illumina.com/products/truseq_rna_library_prep_kit_v2.html|TruSeq RNA Library Preparation Kit v2]] which is an unstranded protocol+===== Sources of Human HCC RNA-seq Data ===== 
-  Jielei provided TruSeq {{ :illumina_stranded_rnaseq_mapping.pdf|Stranded}} RNA-Seq data from 8 mice in August 2017 (See ~/proj/hcc/data/TPT1/readme.txt), which was analyzed using TruSeq Stranded RNA-Seq.\\ \\  + 
-====== **Sources of Human HCC RNA-seq Data** ======+  - [[http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE25599|GSE25599]] 10 match-paired HBV-related Chinese HCC and non-cancerous adjacent tissues. Identified 1,378 significantly DE genes 
 +  - [[http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE33294|GSE33294]] Chinese HBV-related hepatocellular carcinomapaired tumor and non-cancerous adjacent tissues from 3 patients
 +  - [[http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE59259|GSE59259]] Alcohol-related HCC 8 paired samples of liver and HCC 
 +  [[https://trace.ddbj.nig.ac.jp/DRASearch/submission?acc=SRA074279|SRA074279]] 9 Chinese patients: paired HCC and adjacent non-cancerous tissues; [[http://www.sciencedirect.com/science/article/pii/S0888754314002341#bb0080|Publication]] 
 +  - [[http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE55759|GSE55759]] Paired HCC and non-cancerous adjacent tissue; median of 7019 DE genes per set, 93 DE genes shared by 6/8 patients 
 + 
 +|Sex/Age|Viral Infection|Tumor Differentiation|Noof Tumors|Vascular Invasion|TNM* Stage| 
 +|M/62|HBV(-)HCV(-)|Well|1|No|II| 
 +|F/29|HBV(+)HCV(-)|Well|1|No|II| 
 +|M/56|HBV(+)HCV(-)|Moderately| |No|II| 
 +|M/55|HBV(+)HCV(-)|Moderately|1|No|II| 
 +|F/39|HBV(+)HCV(-)|Moderately|1|No|II| 
 +|M/44|HBV(+)HCV(-)|Moderately|1|Yes|III| 
 +|M/47|HBV(+)HCV(-)|Moderately|1|No|II| 
 +|M/48|HBV(+)HCV(-)|Poorly|1|Yes|III| 
 + 
 +\\
  
-  - [[http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE25599|GSE25599]] - 10 match-paired HBV-related Chinese HCC and non-cancerous adjacent tissues. Identified 1,378 significantly DE genes. 
-  - [[http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE33294|GSE33294]] - Chinese HBV-related hepatocellular carcinoma, paired tumor and non-cancerous adjacent tissues from 3 patients. 
-  - [[http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE59259|GSE59259]] - Alcohol-related HCC 8 paired samples of liver and HCC 
-  - [[https://trace.ddbj.nig.ac.jp/DRASearch/submission?acc=SRA074279|SRA074279]] - 9 Chinese patients: paired HCC and adjacent non-cancerous tissues; [[http://www.sciencedirect.com/science/article/pii/S0888754314002341#bb0080|Publication]] 
-  - [[http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE55759|GSE55759]] - Paired HCC and non-cancerous adjacent tissue; median of 7019 DE genes per set, 93 DE genes shared by 6/8 patients 
  
-| Sex/Age\\ | Viral Infection\\ | Tumor Differentiation\\ | No. of Tumors\\ | Vascular Invasion\\ | TNM* Stage\\ | 
-| M/62\\ | HBV(-)HCV(-)\\ | Well\\ | 1\\ | No\\ | II\\ | 
-| F/29\\ | HBV(+)HCV(-)\\ | Well\\ | 1\\ | No\\ | II\\ | 
-| M/56\\ | HBV(+)HCV(-)\\ | Moderately\\ | \\ | No\\ | II\\ | 
-| M/55\\ | HBV(+)HCV(-)\\ | Moderately\\ | 1\\ | No\\ | II\\ | 
-| F/39\\ | HBV(+)HCV(-)\\ | Moderately\\ | 1\\ | No\\ | II\\ | 
-| M/44\\ | HBV(+)HCV(-)\\ | Moderately\\ | 1\\ | Yes\\ | III\\ | 
-| M/47\\ | HBV(+)HCV(-)\\ | Moderately\\ | 1\\ | No\\ | II\\ | 
-| M/48\\ | HBV(+)HCV(-)\\ | Poorly\\ | 1\\ | Yes\\ | III\\ | 
 ===== Collaborators ===== ===== Collaborators =====
-Drs. [[http://uthscsa.edu/csb/faculty/walter.asp|Christi Walter]] and [[http://uthscsa.edu/csa/faculty/Dong.asp|Lily Dong]] from the Department of Structural and Cellular Biology at UT Health Science Center in San Antonio.\\ + 
 +Drs. [[http://uthscsa.edu/csb/faculty/walter.asp|Christi Walter]] and [[http://uthscsa.edu/csa/faculty/Dong.asp|Lily Dong]] from the Department of Structural and Cellular Biology at UT Health Science Center in San Antonio. 
 =====   ===== =====   =====
 +
 ===== Analysis ===== ===== Analysis =====
 +
 ==== Project 1: ==== ==== Project 1: ====
  
Line 40: Line 45:
   - The comparison of Will's DE analysis and ours: Jessica's lab notebook, 2016/2/4.   - The comparison of Will's DE analysis and ours: Jessica's lab notebook, 2016/2/4.
   - PCA: Jessica's lab notebook, 2016/5/31.   - PCA: Jessica's lab notebook, 2016/5/31.
-  - Habil inferred the eigengene in TCGA data on [[http://oncinfo.org/habils_lab_notebook#section20190514|2019-05-14]]. +  - Habil inferred the eigengene in TCGA data on [[http://oncinfo.org/habils_lab_notebook#section20190514|2019/05/14]].
  
 ==== Publication: ==== ==== Publication: ====
-Jessica Zavadil, Maryanne Herzig, Kim Hildreth, Amir Foroushani, William Boswell, Ronald Walter, Robert Reddick, Hugh White, Habil Zare. C3HeB/FeJ Mice Mimic Gene Expression and Pathobiological Features of Human Hepatocellular Carcinoma,  //Molecular Carcinogenesis//, In press+ 
-{{:wiki:public:jessica_compare.png?direct&400|}}+Jessica Zavadil, Maryanne Herzig, Kim Hildreth, Amir Foroushani, William Boswell, Ronald Walter, Robert Reddick, Hugh White, Habil Zare. "C3HeB/FeJ Mice mimic many aspects of gene expression and pathobiological features of human hepatocellular carcinoma." [[https://onlinelibrary.wiley.com/doi/abs/10.1002/mc.22929|Molecular carcinogenesis ]]58.3 (2019): 309-320. 
 + 
 +{{:wiki:public:jessica_compare.png?direct&400}} 
 ==== Project 2: ==== ==== Project 2: ====
  
-  - Habil cleaned and mapped Lilyi's data on 2017/08/10 ([[habils_lab_notebook|lano]]). +  - Habil cleaned and mapped Lilyi's data on 2017/08/10 ([[:habils_lab_notebook|lano]]). 
-  - Hanie did DE analysis on mapped transcripts on 2017/08/14 ([[hanies_lab_notebook|lano]]) +  - Hanie did DE analysis on mapped transcripts on 2017/08/14 ([[:hanies_lab_notebook|lano]]) 
-\\ + 
 +==== Project 3: ==== 
 + 
 +**Protemoe and the APEX1 interactome:** Data-independent analysis Mass spectrometry (DIA-MS) done on 3 biological replicates of 2 HCC cell lines (SNU398 and Huh7) and an immortalized hepatocyte line (THLE2). Also, we have 1 biological replicate of a primary hepatocyte organoid derived from a patient (UTHSS-28T). 
 + 
 +**Our goal** is to understand if the APE1 interactome is 1) different in HCC cell lines vs non tumor cells, 2) different between two HCC cell lines with {{:apex1_analysis_of_dia-ms_dc_139-12-1-20.pptx|overexpressed}}  APEX1 (SNU398 vs Huh7), 3) and how it compares to that described in [[https://www.nature.com/articles/s41598-019-56981-z|Ayyildiz 2020]]. Christi sent these data to Habil on 2020-12-02 in an email entitled: "M2021-026 {{:hcc-in-vitro-proteome-scaffold-dia-2020-12-02.7z|Scaffold}}  DIA and Excel files"
 + 
 ===== Related work ===== ===== Related work =====
  
-  - Hoenerhoff, Mark J., et al. "Global Gene Profiling of Spontaneous Hepatocellular Carcinoma in B6C3F1 Mice Similarities in the Molecular Landscape with Human Liver Cancer." //[[http://www.ncbi.nlm.nih.gov/pubmed/21571946|Toxicologic]] pathology// 39.4 (2011): 678-699. \\  Microarray analysis of tumors from B6C3F1 mice (first generation of C57B/L6J and C3HeB/FeJ strains) +  - Hoenerhoff, Mark J., et al. "Global Gene Profiling of Spontaneous Hepatocellular Carcinoma in B6C3F1 Mice Similarities in the Molecular Landscape with Human Liver Cancer." //[[http://www.ncbi.nlm.nih.gov/pubmed/21571946|Toxicologic]] pathology//39.4 (2011): 678-699. \\ Microarray analysis of tumors from B6C3F1 mice (first generation of C57B/L6J and C3HeB/FeJ strains) 
-  - Keane, Thomas M., et al. "Mouse genomic variation and its effect on phenotypes and gene regulation." [[http://www.nature.com/nature/journal/v477/n7364/full/nature10413.html|Nature]] 477.7364 (2011): 289-294.\\  Compared the standard reference genomes of mouse (C57BL/6J) with other strains. +  - Keane, Thomas M., et al. "Mouse genomic variation and its effect on phenotypes and gene regulation." [[http://www.nature.com/nature/journal/v477/n7364/full/nature10413.html|Nature]] 477.7364 (2011): 289-294. \\  Compared the standard reference genomes of mouse (C57BL/6J) with other strains. 
-  - Munger, Steven C., et al. "RNA-Seq alignment to individualized genomes improves transcript abundance estimates in multiparent populations."//[[http://www.genetics.org/content/198/1/59.full#T2|Genetics]]// 198.1 (2014): 59-73.\\  Proposed a method for strain-specific alignment and compared with mapping RNAseq data from a strain to the reference genome. Observed >10% change in expression in about 2,000 genes. +  - Munger, Steven C., et al. "RNA-Seq alignment to individualized genomes improves transcript abundance estimates in multiparent populations."//[[http://www.genetics.org/content/198/1/59.full#T2|Genetics]]// 198.1 (2014): 59-73. \\ Proposed a method for strain-specific alignment and compared with mapping RNAseq data from a strain to the reference genome. Observed >10% change in expression in about 2,000 genes. 
-  - Huang, Shunping, et al. "Transforming genomes using **MOD** files with applications." //Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics//. [[http://web.cs.ucla.edu/~weiwang/paper/ACMBCB13_1.pdf|ACM]], 2013.\\  Figure 4 shows that if we map to reference genome, we may loose not more than 7% of reads. +  - Huang, Shunping, et al. "Transforming genomes using **MOD**  files with applications." //Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics//. [[http://web.cs.ucla.edu/~weiwang/paper/ACMBCB13_1.pdf|ACM]], 2013. \\  Figure 4 shows that if we map to reference genome, we may loose not more than 7% of reads. 
-  - Hart, Steven N., et al. "Calculating sample size estimates for RNA sequencing data." //Journal of Computational [[http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3842884/|Biology]]// 20.12 (2013): 970-978.Wu, Hao, Chi Wang, and Zhijin Wu. "PROPER: comprehensive power evaluation for differential expression using RNA-seq." //[[http://bioinformatics.oxfordjournals.org/content/31/2/233.short|Bioinformatics]]// 31.2 (2015): 233-241.From [[http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3842884/figure/f2/|Fig2]] and [[http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3842884/figure/f2/|Fig3]] of Huang et al. paper, and [[http://bioinformatics.oxfordjournals.org.libproxy.txstate.edu/content/31/2/233/F5.expansion.html|Fig5]] of Hart et al., it seems that at least 5-7 samples are needed for each condition. +  - Hart, Steven N., et al. "Calculating sample size estimates for RNA sequencing data." //Journal of Computational [[http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3842884/|Biology]]//20.12 (2013): 970-978.Wu, Hao, Chi Wang, and Zhijin Wu. "PROPER: comprehensive power evaluation for differential expression using RNA-seq." //[[http://bioinformatics.oxfordjournals.org/content/31/2/233.short|Bioinformatics]]//31.2 (2015): 233-241.From [[http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3842884/figure/f2/|Fig2]] and [[http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3842884/figure/f2/|Fig3]] of Huang et al. paper, and [[http://bioinformatics.oxfordjournals.org.libproxy.txstate.edu/content/31/2/233/F5.expansion.html|Fig5]] of Hart et al., it seems that at least 5-7 samples are needed for each condition. 
-  - Ching, Travers, Sijia Huang, and Lana X. Garmire. "Power analysis and sample size estimation for RNA-Seq differential expression." //[[http://rnajournal.cshlp.org/content/early/2014/09/22/rna.046011.114|rna]]// 20.11 (2014): 1684-1696. +  - Ching, Travers, Sijia Huang, and Lana X. Garmire. "Power analysis and sample size estimation for RNA-Seq differential expression." //[[http://rnajournal.cshlp.org/content/early/2014/09/22/rna.046011.114|rna]]//20.11 (2014): 1684-1696. 
-  - Comprehensive and Integrative Genomic Characterization of Hepatocellular Carcinoma, [[http://www.cell.com/cell/abstract/S0092-8674(17)30639-6?innerTabgraphical_S0092867417306396|Cell]], 2017 [{{ :ally-copmprehensive_and_integrative_genomic_char_of_hcc-cell-2017.pdf|pdf}}]. TCGA's HCC data and subtyping using DNA copy number, DNA methylation, mRNA expression, miRNA expression and RPPA (protein expression). Links to the MDACC dataset with 100 HCC samples. +  - Comprehensive and Integrative Genomic Characterization of Hepatocellular Carcinoma, [[http://www.cell.com/cell/abstract/S0092-8674(17)30639-6?innerTabgraphical_S0092867417306396|Cell]], 2017 [{{:ally-copmprehensive_and_integrative_genomic_char_of_hcc-cell-2017.pdf|pdf}}  ]. TCGA's HCC data and subtyping using DNA copy number, DNA methylation, mRNA expression, miRNA expression and RPPA (protein expression). Links to the MDACC dataset with 100 HCC samples. 
-\\ **Related software**\\ +  - Subramaniam, Somasundaram, Robin K. Kelley, and Alan P. Venook. "A review of hepatocellular carcinoma (HCC) staging systems." [[http://cco.amegroups.com/article/view/2528/3943|Chinese clinical oncology]] 2.4 (2013).​​​ 
 +  - Alexandrov, Ludmil B., et al. "The repertoire of mutational signatures in human cancer." [[https://www.nature.com/articles/s41586-020-1943-3#Sec17|Nature]] 578.7793 (2020): 94-101. \\  Analyzed WGS and WXS data of thousands of tumors available from TCGA and PCAWG consortia. 
 +  - Dr. Sukeshi Arora's {{:sukeshi_arora_hcc_update_3.18.20.pptx|slides}}  presented in the HCC meeting on 2020-04-18, which summarizes statistics on the prognosis, the current clinical practice, and response to different treatments. 
 + 
 +===== Related software =====
  
   - [[https://ccb.jhu.edu/software/tophat/index.shtml|TopHat]], useful for aligning RNAseq data to a genome.   - [[https://ccb.jhu.edu/software/tophat/index.shtml|TopHat]], useful for aligning RNAseq data to a genome.
Line 67: Line 85:
   - [[http://homer.salk.edu/homer/basicTutorial/mapping.html|Homer's]] quick tutorial on mapping NGS data using several tools including bowtie2, bwa, TopHAt, etc. with command line examples.   - [[http://homer.salk.edu/homer/basicTutorial/mapping.html|Homer's]] quick tutorial on mapping NGS data using several tools including bowtie2, bwa, TopHAt, etc. with command line examples.
   - [[http://gqinnovationcenter.com/documents/bioinformatics/RNAseq_Cuba_OMICS_2013.pdf|Lefebvre's]] quick tutorial on RNA-Seq data analysis.   - [[http://gqinnovationcenter.com/documents/bioinformatics/RNAseq_Cuba_OMICS_2013.pdf|Lefebvre's]] quick tutorial on RNA-Seq data analysis.
-  - Schiffthaler's ~1 hour video on RNA Seq data [[https://www.youtube.com/watch?v=1rNEkWSxB5s|preprocessing]] including FastQC, sortmerna to exclude rRNA, trimmomatic to trim the adaptors and low quality bps, STAR to map reads to the genome, samtools to index the bam file, IGV to visualize the reads on the genome, and HTSeq to count the number of reads mapped to each gene (coverage). These are all steps we need to do before differential analysis using, say DESeq2. [[http://www.epigenesys.eu/images/stories/protocols/pdf/20150303161357_p67.pdf|This]] is a textual version explaining the same steps.\\ \\  +  - Schiffthaler's ~1 hour video on RNA Seq data [[https://www.youtube.com/watch?v=1rNEkWSxB5s|preprocessing]] including FastQC, sortmerna to exclude rRNA, trimmomatic to trim the adaptors and low quality bps, STAR to map reads to the genome, samtools to index the bam file, IGV to visualize the reads on the genome, and HTSeq to count the number of reads mapped to each gene (coverage). These are all steps we need to do before differential analysis using, say DESeq2. [[http://www.epigenesys.eu/images/stories/protocols/pdf/20150303161357_p67.pdf|This]] is a textual version explaining the same steps. 
-[[|Drafts]], [[|Next steps]]+ 
 +[[:hepatocellular_carcinoma|Drafts]], [[:hepatocellular_carcinoma|Next steps]] 
 + 
 +\\ 
 +