Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
r_test [2018/10/24 00:23]
admin
r_test [2019/05/22 07:50] (current)
admin [Question 2:]
Line 2: Line 2:
  
 **An R test** \\ **An R test** \\
-This simple ​question is designed to test the applicants on their expertise in R, self learning, working under pressure, presenting their results, and explaining their work.\\ +These simple ​questions are designed to test the applicants on their expertise in R, self learning, working under pressure, presenting their results, and explaining their work. 
-- Install [[https://​bioconductor.org/​packages/​release/​bioc/​html/​limma.html|limma]] package from Bioconductor.+ 
 +==== Question 1 ==== 
 + 
 +  ​- Install [[https://​bioconductor.org/​packages/​release/​bioc/​html/​limma.html|limma]] package from Bioconductor.
   - Have a look at the [[https://​bioconductor.org/​packages/​release/​bioc/​vignettes/​limma/​inst/​doc/​usersguide.pdf|user guide]]. Specifically read the "Quick start" section.   - Have a look at the [[https://​bioconductor.org/​packages/​release/​bioc/​vignettes/​limma/​inst/​doc/​usersguide.pdf|user guide]]. Specifically read the "Quick start" section.
   - Install [[http://​bioconductor.org/​packages/​release/​bioc/​html/​GEOquery.html|GEOquery]] data package from Bioconductor.   - Install [[http://​bioconductor.org/​packages/​release/​bioc/​html/​GEOquery.html|GEOquery]] data package from Bioconductor.
   - Write a script that downloads GSE59259 dataset and computes the list of differentially expressed genes with adjusted p-value better than 0.01. The first lines of your script look like these:   - Write a script that downloads GSE59259 dataset and computes the list of differentially expressed genes with adjusted p-value better than 0.01. The first lines of your script look like these:
-  - library(limma) +  -
- +
-library(GEOquery)+
  
 <​code>​ <​code>​
-  ​gset <- getGEO("​GSE59259",​ GSEMatrix =TRUE)+ ​library(limma) 
 + ​library(GEOquery) 
 + gset <- getGEO("​GSE59259",​ GSEMatrix =TRUE)
   type <- c(rep("​N",​8),​rep("​H",​8))   type <- c(rep("​N",​8),​rep("​H",​8))
   des <- ????   des <- ????
Line 18: Line 21:
   fit <- ImFit(Data, des) ## !   fit <- ImFit(Data, des) ## !
   ...   ...
-- The rest of the script follows the instruction in Section 3.2 (Sample limma Session) with appropriate modifications. 
-- Your script should save the list of differentially expressed (DE) genes and in csv format in a file named "​de.csv"​. The output file should have 2 columns: gene names and the corresponding adjusted [[https://​discover.nci.nih.gov/​microarrayAnalysis/​Statistical.Tests.jsp|p-value]] (see the "​adj.P.Val"​ column of the top table). 
-- Use pheatmap function to plot the expression of the top 5 DE genes. 
-- **Deliverables**:​ Report the number of DE genes, de.csv file, your script, heatmap.png,​ and the number of hours it took you to do the test. Additionally,​ write a short description of what you did in 5-10 sentences. The description should indicate your proficiency in English writing and your ability to communicate with a biologist who has little or no background in programming. 
-- You need to be able to orally explain all parts of your script including the above lines. Be prepared to explain the input, output, and process done by each function you use. 
-- If you are not familiar with gene expression and have little idea what limma does, try to learn the needed concept from the web, e.g., [[https://​en.wikipedia.org/​wiki/​Gene_expression|Wikipedia]]. You can use information from tutorials, books, papers, experts in the fields, your friends, etc., however, make sure you can explain your work thoroughly. 
 </​code>​ </​code>​
 +
 +- The rest of the script follows the instruction in Section 3.2 (Sample limma Session) with appropriate modifications. \\ - Your script should save the list of differentially expressed (DE) genes and in csv format in a file named "​de.csv"​. The output file should have 2 columns: gene names and the corresponding adjusted [[https://​discover.nci.nih.gov/​microarrayAnalysis/​Statistical.Tests.jsp|p-value]] (see the "​adj.P.Val"​ column of the top table). \\
 +- Use pheatmap function to plot the expression of the top 5 DE genes.
 +
 + \\ - **Deliverables**:​ Report the number of DE genes, de.csv file, your script, heatmap.png,​ and the number of hours it took you to do the test. Additionally,​ write a short description of what you did in 5-10 sentences. The description should indicate your proficiency in English writing and your ability to communicate with a biologist who has little or no background in programming. \\ - You need to be able to orally explain all parts of your script including the above lines. Be prepared to explain the input, output, and process done by each function you use. \\ - If you are not familiar with gene expression and have little idea what limma does, try to learn the needed concept from the web, e.g., [[https://​en.wikipedia.org/​wiki/​Gene_expression|Wikipedia]]. You can use information from tutorials, books, papers, experts in the fields, your friends, etc., however, make sure you can explain your work thoroughly.
  
 ---- ----
 +
 +==== Question 2: ====
 +
 +Using the [[https://​bioconductor.org/​packages/​release/​bioc/​vignettes/​maftools/​inst/​doc/​maftools.html|maftools]] and [[http://​bioconductor.org/​packages/​release/​bioc/​html/​TCGAbiolinks.html|TCGAbiolinks]] packages, determine the 3 most frequently mutated genes in liver cancer. Which of these 3 mutations is more predictive of survival? To answer this question, write a function that takes as input a gene name, and save KM plots in png format. Add the p-value as a legend in the plot. Deliverables are similar to question 1.