Objectives

To identify the role of stress in the dynamics of the genome (how?).

Data
  1. a) Whole RNA-seq (RPKM) of 13 Arabidopsis samples treated with different bacterial stress (design). Each sample has 3 biological replicates. maSigPro identified 9 gene clusters including 3 (fast immune response), 6 (slow response), 2 (touch response), and 9 (kind of "opposite" in mutants).
    b) RASL-Seq data of the same samples.
  2. RASL-Seq of mutants. 195 samples in 5 mutant groups, 3 treatment groups, 3 biological replicates and 5 time points, were analyzed (see sample information).
    a) Expression of three immune response genes, PR1,PR2, and PR5, were measured using qPCR.
    b) Expression of 277 genes were measured using RASL-Seq twice. The first (second) sheet corresponds to the run in May (June) 2017 with 14 (23) million reads in total. 103 of these genes are selected from clusters 2, 3, 6, and 9, which were determined using the above RNA-Seq data.
  3. Human RNA-Seq data from Mayo Clinic to test these hypotheses, which Bess made based on Drosophila data. Specifically, the raw fastq files from MayoRNAseq dataset for NCI, PSP, AD, and PA (Synapse ID: syn5550404, cerebellum and temporal cortex).
  4. CSV file with RASL-Seq data (both RASL-Seq rounds combined) normalized using 8 house keeping genes including Tip41
  5. PPT file containing additional cluster data that compares normalization using only Tip41 vs 8 housekeeping genes(includes Tip41)
  6. Additional genes. Samples are identical to the previous set (195) This is the unnormalized raw count file, note that this file has 75 genes instead of the 66 (as in the normalized file) because this raw count file includes 9 house keeping genes. Genes from cluster #3 (C3) and cluster #4 (C4) were added in this round. Exclude sample "CN-III" (Col-0 Naive III biological replicate) from the analysis as this one sample is outlier.
  7. Genes and samples to remove, and the criteria for selecting the genes.
  8. This dataset contains the normalized value of the expression levels of all genes analyzed as of 23 Oct 2017.
  9. The RNA-Seq data including TE, and the RASL-Seq data of cohort 1 (normalized using the housekeeping genes). Yogi sent these data on 2017-11-22 to compute for each gene, the correlation between the two experiments in cohort 1.
  10. CSV file containing correlation analysis that has Pearson, Spearman, and Kendall methods and also the p-value to determine how reliable each correlation method is.

Collaborators

Dr. Hong-Gu Kang from the Biology Department at The Texas State University.

Analysis

  1. Replicate plots of maSigPro clusters using PlotGroups() function (Figure 4).
  2. Compute eigengenes for each of the clusters identified by maSigPro (Entry Date) 2017-2-20 Bryan's labnotebook.
  3. RASL-Seq analysis. See the 2017/04/14 note on Habil's lano.
  4. Our plan for analyzing the second round of RASL-Seq data (gdoc).
  5. Comparison between qPCR and RASL-Seq on PR1 (Spearman correlation=0.98), PR2 (0.96), and PR5 (0.95). See the qpcr.R script for details.
  6. In RASL-Seq data, genes in the clusters are a subset of the corresponding clusters in the RNA-Seq data. We used this same subset in the RNA-Seq data, which produced similar eigengene patterns as compared to the original RNA-Seq eigengene patterns. See rnaseq.R script for details.
  7. Eigengenes of wild type samples measured by RASL-Seq have a similar pattern as RNA-Seq.
  8. Scatter plots show that eigengenes of modules 2,3,6,9 correlate in wild-type samples of cohort 1, when RASL-Seq data is compared to RNA-Seq data.
  9. Comparison of RASL-Seq mutant D1, D2, D3, and D4 with wild type RASL-Seq
  10. Heatmap of cohort 2 shows more appreciable clustering in regards to time interval when compared to mutant type. Heatmap of eigengene expression shows similar patterns.
  11. Slides summarizing analysis points 5-10
  12. Modified the RASL-Seq vs qPCR plots to logarithmic scale, took out text, and changed time color: PR1, PR2, PR5. Modified the RASL-Seq vs. RNA-Seq eigengene plots, took out text, changed time color, scale is normal: ME2, ME3, ME6, ME9.
  13. Excel table containing differential expression analysis comparing RASL-Seq Wildtype vs. RASL-Seq Mutant Type
  14. Plan for averaging replicates and building a hierarchical analysis for the cluster.
  15. We determine how cluster of genes changes in each mutants using this metric.
  16. Hierarchical cluster analysis and Re-Cluster analysis slides.
  17. Slides that contain Basal, Ratio, and Peak analyses with corresponding excel tables which contain data.
  18. Plan for completing DE analysis with added genes from September.
  19. Plan for completing correlation between RNA-Seq vs RASL-Seq, and modification of current heatmaps to express significant p-value more clearly. To compute the correlations, instead of averaging the replicates, it is better to estimate "multivariate correlation" (e.g., using CORREP::cor.balance). It is VERY important to standardize the data. Plan for correlation analysis.

Related work

  1. Bordiya Y and Kang HG. (2016) Genome-wide analysis of chromatin accessibility in Arabidopsis infected with Pseudomonas syringae. Methods in Molecular Biology, in press.

Related software and resources
  1. Our code is in the Evolution repository.
  2. maSigPro
  3. Pigengene


Drafts, Next steps