Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision |
hepatocellular_carcinoma [2019/10/18 16:19] – [Data] admin | hepatocellular_carcinoma [2020/12/07 22:25] – [Data] admin |
---|
| |
| |
===== Data ===== | ===== Related software ===== |
| |
- RNAseq from liver of 9 treated and 4 control samples ([[:christis_data|Christi's data]]). | - [[https://ccb.jhu.edu/software/tophat/index.shtml|TopHat]], useful for aligning RNAseq data to a genome. |
- The closest reference genome to our mouse strain is [[http://www.csbio.unc.edu/CCstatus/index.py?run=Pseudo|C3H/HeJ]]. We can use fasta and MOD files from build 37 (mm9), which is more [[https://www.biostars.org/p/81602/|annotated]] than build 38 (mm10). | - [[http://www.nature.com.libproxy.uthscsa.edu/nbt/journal/v33/n3/full/nbt.3122.html|StringTie]], reconstructs transcriptom from RNAseq data (2015). |
- Alternatively, we can map to the mouse reference transcriptome ([[:hepatocellular_carcinoma|NCBI37]]/mm9, rna.fa), and simplify the analysis in expense of losing upto 7% of reads. | - John Garbe has tutorials ([[https://www.msi.umn.edu/sites/default/files/RNA-Seq Module 1.pdf|1]], [[https://www.msi.umn.edu/sites/default/files/RNA_seq_Lecture2_2014_v2.pdf|2]], and [[https://www.yumpu.com/en/document/view/6745921/rna-seq-module-3|3]]) on design and analysis of RNAseq. |
- Ron Walter's lab ran their pipeline to filter the fastq data. These files are stored in folder called Filtered_fastq_files. From Will Boswell: "PE stands for paired end reads. For example, you have a 500bp fragment and your target sequence size is 125bp. The fragment will be sequenced 125 bases from one end and 125 bases from the other end, and Illumina refers to this as paired end reads. SE stands for single end reads, which in our case is generated during our filtering process. If you look at the pre-filtered reads, you’ll see only PE1 and PE2 for each sample. During filtration, if one of the PE’s have low quality, it is tossed out leaving the other PE, and since it no longer has a mate pair, it’s kept as a single end sequence. Also, there are several files in the post-filtered directories that are considered intermediate files in the filtering process that we don’t need; these are process files used by the filtering script. The only files you should be concerned with are the _pe1.r.fastq, _pe2.r.fastq, _se.r.fastq, and _PE.filter.stats (gives you the number of reads mapped to the genome for each PE and SE)." A summary of the analysis can be found {{:mouse_hcc_liver_sequencing_summary.docx|here}}. | - [[http://homer.salk.edu/homer/basicTutorial/mapping.html|Homer's]] quick tutorial on mapping NGS data using several tools including bowtie2, bwa, TopHAt, etc. with command line examples. |
- Sequencing was completed by Beckman Coulter using [[http://www.illumina.com/products/truseq_rna_library_prep_kit_v2.html|TruSeq RNA Library Preparation Kit v2]] which is an unstranded protocol. | - [[http://gqinnovationcenter.com/documents/bioinformatics/RNAseq_Cuba_OMICS_2013.pdf|Lefebvre's]] quick tutorial on RNA-Seq data analysis. |
- Jielei provided TruSeq {{:illumina_stranded_rnaseq_mapping.pdf|Stranded}} RNA-Seq data from 8 mice in August 2017 (See ~/proj/hcc/data/TPT1/readme.txt), which was analyzed using TruSeq Stranded RNA-Seq. | - Schiffthaler's ~1 hour video on RNA Seq data [[https://www.youtube.com/watch?v=1rNEkWSxB5s|preprocessing]] including FastQC, sortmerna to exclude rRNA, trimmomatic to trim the adaptors and low quality bps, STAR to map reads to the genome, samtools to index the bam file, IGV to visualize the reads on the genome, and HTSeq to count the number of reads mapped to each gene (coverage). These are all steps we need to do before differential analysis using, say DESeq2. [[http://www.epigenesys.eu/images/stories/protocols/pdf/20150303161357_p67.pdf|This]] is a textual version explaining the same steps. |
- Gao, Qiang, et al. "Integrated Proteogenomic Characterization of HBV-Related Hepatocellular Carcinoma." //[[https://www.sciencedirect.com/science/article/pii/S0092867419310037|Cell//]]// 179.2 (2019): 561-577. \\ "The data of WES, transcriptome sequencing, proteome, and phosphoproteome are available in [[https://www.biosino.org/node|NODE]] (accession # [[https://www.biosino.org/node/experiment/detail/OEX001697|OEP000321]]). | - **Protemoe:** Data-independent analysis Mass spectrometry (DIA-MS) done on 3 HCC cell lines and an immortalized hepatocyte line, each with 3 biological replicates. Our goal is to understand if the APE1 interactome is 1) different in HCC cell lines vs non tumor cells, 2) different between HCC cell lines with overexpressed APEX1 (SNU398 vs Huh7), 3) and how it compares to that described in [[https://www.nature.com/articles/s41598-019-56981-z|Ayyildiz 2020]]. |
| [[:hepatocellular_carcinoma|Drafts]], [[:hepatocellular_carcinoma|Next steps]] |
| |
| |
- Ching, Travers, Sijia Huang, and Lana X. Garmire. "Power analysis and sample size estimation for RNA-Seq differential expression." //[[http://rnajournal.cshlp.org/content/early/2014/09/22/rna.046011.114|rna]]//20.11 (2014): 1684-1696. | - Ching, Travers, Sijia Huang, and Lana X. Garmire. "Power analysis and sample size estimation for RNA-Seq differential expression." //[[http://rnajournal.cshlp.org/content/early/2014/09/22/rna.046011.114|rna]]//20.11 (2014): 1684-1696. |
- Comprehensive and Integrative Genomic Characterization of Hepatocellular Carcinoma, [[http://www.cell.com/cell/abstract/S0092-8674(17)30639-6?innerTabgraphical_S0092867417306396|Cell]], 2017 [{{:ally-copmprehensive_and_integrative_genomic_char_of_hcc-cell-2017.pdf|pdf}} ]. TCGA's HCC data and subtyping using DNA copy number, DNA methylation, mRNA expression, miRNA expression and RPPA (protein expression). Links to the MDACC dataset with 100 HCC samples. | - Comprehensive and Integrative Genomic Characterization of Hepatocellular Carcinoma, [[http://www.cell.com/cell/abstract/S0092-8674(17)30639-6?innerTabgraphical_S0092867417306396|Cell]], 2017 [{{:ally-copmprehensive_and_integrative_genomic_char_of_hcc-cell-2017.pdf|pdf}} ]. TCGA's HCC data and subtyping using DNA copy number, DNA methylation, mRNA expression, miRNA expression and RPPA (protein expression). Links to the MDACC dataset with 100 HCC samples. |
- Subramaniam, Somasundaram, Robin K. Kelley, and Alan P. Venook. "A review of hepatocellular carcinoma (HCC) staging systems." [[http://cco.amegroups.com/article/view/2528/3943|Chinese clinical oncology]] 2.4 (2013). | - Subramaniam, Somasundaram, Robin K. Kelley, and Alan P. Venook. "A review of hepatocellular carcinoma (HCC) staging systems." [[http://cco.amegroups.com/article/view/2528/3943|Chinese clinical oncology]] 2.4 (2013). |
| - Alexandrov, Ludmil B., et al. "The repertoire of mutational signatures in human cancer." [[https://www.nature.com/articles/s41586-020-1943-3#Sec17|Nature]] 578.7793 (2020): 94-101. \\ Analyzed WGS and WXS data of thousands of tumors available from TCGA and PCAWG consortia. |
---- | - Dr. Sukeshi Arora's {{:sukeshi_arora_hcc_update_3.18.20.pptx|slides}} presented in the HCC meeting on 2020-04-18, which summarizes statistics on the prognosis, the current clinical practice, and response to different treatments. |
| |
| |