Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Last revision Both sides next revision
hepatocellular_carcinoma [2019/10/07 18:45]
admin [Objectives]
hepatocellular_carcinoma [2019/10/18 19:19]
admin [Data]
Line 8: Line 8:
 ===== Data ===== ===== Data =====
  
-  - RNAseq from liver of 9 treated and 4 control samples ([[christis_data|Christi'​s data]]).+  - RNAseq from liver of 9 treated and 4 control samples ([[:christis_data|Christi'​s data]]).
   - The closest reference genome to our mouse strain is [[http://​www.csbio.unc.edu/​CCstatus/​index.py?​run=Pseudo|C3H/​HeJ]]. We can use fasta and MOD files from build 37 (mm9), which is more [[https://​www.biostars.org/​p/​81602/​|annotated]] than build 38 (mm10).   - The closest reference genome to our mouse strain is [[http://​www.csbio.unc.edu/​CCstatus/​index.py?​run=Pseudo|C3H/​HeJ]]. We can use fasta and MOD files from build 37 (mm9), which is more [[https://​www.biostars.org/​p/​81602/​|annotated]] than build 38 (mm10).
-  - Alternatively,​ we can map to the mouse reference transcriptome ([[|NCBI37]]/​mm9,​ rna.fa), and simplify the analysis in expense of losing upto 7% of reads. +  - Alternatively,​ we can map to the mouse reference transcriptome ([[:​hepatocellular_carcinoma|NCBI37]]/​mm9,​ rna.fa), and simplify the analysis in expense of losing upto 7% of reads. 
-  - Ron Walter'​s lab ran their pipeline to filter the fastq data. These files are stored in folder called Filtered_fastq_files. From Will Boswell: "PE stands for paired end reads. For example, you have a 500bp fragment and your target sequence size is 125bp. The fragment will be sequenced 125 bases from one end and 125 bases from the other end, and Illumina refers to this as paired end reads. SE stands for single end reads, which in our case is generated during our filtering process. If you look at the pre-filtered reads, you’ll see only PE1 and PE2 for each sample. During filtration, if one of the PE’s have low quality, it is tossed out leaving the other PE, and since it no longer has a mate pair, it’s kept as a single end sequence. Also, there are several files in the post-filtered directories that are considered intermediate files in the filtering process that we don’t need; these are process files used by the filtering script. The only files you should be concerned with are the _pe1.r.fastq,​ _pe2.r.fastq,​ _se.r.fastq,​ and _PE.filter.stats (gives you the number of reads mapped to the genome for each PE and SE)." A summary of the analysis can be found {{ :​mouse_hcc_liver_sequencing_summary.docx|here}}.+  - Ron Walter'​s lab ran their pipeline to filter the fastq data. These files are stored in folder called Filtered_fastq_files. From Will Boswell: "PE stands for paired end reads. For example, you have a 500bp fragment and your target sequence size is 125bp. The fragment will be sequenced 125 bases from one end and 125 bases from the other end, and Illumina refers to this as paired end reads. SE stands for single end reads, which in our case is generated during our filtering process. If you look at the pre-filtered reads, you’ll see only PE1 and PE2 for each sample. During filtration, if one of the PE’s have low quality, it is tossed out leaving the other PE, and since it no longer has a mate pair, it’s kept as a single end sequence. Also, there are several files in the post-filtered directories that are considered intermediate files in the filtering process that we don’t need; these are process files used by the filtering script. The only files you should be concerned with are the _pe1.r.fastq,​ _pe2.r.fastq,​ _se.r.fastq,​ and _PE.filter.stats (gives you the number of reads mapped to the genome for each PE and SE)." A summary of the analysis can be found {{:​mouse_hcc_liver_sequencing_summary.docx|here}}.
   - Sequencing was completed by Beckman Coulter using [[http://​www.illumina.com/​products/​truseq_rna_library_prep_kit_v2.html|TruSeq RNA Library Preparation Kit v2]] which is an unstranded protocol.   - Sequencing was completed by Beckman Coulter using [[http://​www.illumina.com/​products/​truseq_rna_library_prep_kit_v2.html|TruSeq RNA Library Preparation Kit v2]] which is an unstranded protocol.
-  - Jielei provided TruSeq {{ :​illumina_stranded_rnaseq_mapping.pdf|Stranded}} RNA-Seq data from 8 mice in August 2017 (See ~/​proj/​hcc/​data/​TPT1/​readme.txt),​ which was analyzed using TruSeq Stranded RNA-Seq.\\ ​\\ +  - Jielei provided TruSeq {{:​illumina_stranded_rnaseq_mapping.pdf|Stranded}} ​ RNA-Seq data from 8 mice in August 2017 (See ~/​proj/​hcc/​data/​TPT1/​readme.txt),​ which was analyzed using TruSeq Stranded RNA-Seq
 +  - Gao, Qiang, et al. "​Integrated Proteogenomic Characterization of HBV-Related Hepatocellular Carcinoma."​ //​[[https://​www.sciencedirect.com/​science/​article/​pii/​S0092867419310037|Cell//​]]//​ 179.2 (2019): 561-577. \\  "​The data of WES, transcriptome sequencing, proteome, and phosphoproteome are available in [[https://​www.biosino.org/​node|NODE]] (accession # [[https://​www.biosino.org/​node/​experiment/​detail/​OEX001697|OEP000321]]). 
 + 
 ====== **Sources of Human HCC RNA-seq Data** ====== ====== **Sources of Human HCC RNA-seq Data** ======