Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
r_test [2019/07/12 03:19]
admin [Question 2:]
r_test [2020/01/30 17:36] (current)
admin [Question 1]
Line 10: Line 10:
   - Install [[http://​bioconductor.org/​packages/​release/​bioc/​html/​GEOquery.html|GEOquery]] data package from Bioconductor.   - Install [[http://​bioconductor.org/​packages/​release/​bioc/​html/​GEOquery.html|GEOquery]] data package from Bioconductor.
   - Write a script that downloads GSE59259 dataset and computes the list of differentially expressed genes with adjusted p-value better than 0.01. The first lines of your script look like these:   - Write a script that downloads GSE59259 dataset and computes the list of differentially expressed genes with adjusted p-value better than 0.01. The first lines of your script look like these:
-  - 
  
 <​code>​ <​code>​
Line 19: Line 18:
   des <- ????   des <- ????
   Data <- log(exprs(gset[[1|]])+0.001)   Data <- log(exprs(gset[[1|]])+0.001)
-  fit <- ImFit(Data, des) ## !+  fit <- lmFit(Data, des) ## !
   ...   ...
 </​code>​ </​code>​
Line 26: Line 25:
 - Use pheatmap function to plot the expression of the top 5 DE genes. - Use pheatmap function to plot the expression of the top 5 DE genes.
  
- \\ - **Deliverables**:​ Report the number of DE genes, de.csv file, your script, heatmap.png,​ and the number of hours it took you to do the test. Additionally,​ write a short description of what you did in 5-10 sentences. The description should indicate your proficiency in English writing ​and your ability to communicate with a biologist ​who has little or no background in programming. \\ - You need to be able to orally explain all parts of your script including the above lines. Be prepared to explain the input, output, and process done by each function you use. \\ - If you are not familiar with gene expression and have little idea what limma does, try to learn the needed concept from the web, e.g., [[https://​en.wikipedia.org/​wiki/​Gene_expression|Wikipedia]]. You can use information from tutorials, books, papers, experts in the fields, your friends, etc., however, make sure you can explain your work thoroughly.+ \\ - **Deliverables**:​ Report the number of DE genes, de.csv file, your script, heatmap.png,​ and the approximate ​number of hours it took you to do the test. Additionally,​ write a short description of what you did in 5-10 sentences. The description should indicate your proficiency in English writing. To test your ability to communicate with a biologist ​__who has little or no background in programming__,​ write the summary of your results in a paragraph titled "​conclusion"​. \\ - You need to be able to orally explain all parts of your script including the above lines. Be prepared to explain the input, output, and process done by each function you use. \\ - If you are not familiar with gene expression and have little idea what limma does, try to learn the needed concept from the web, e.g., [[https://​en.wikipedia.org/​wiki/​Gene_expression|Wikipedia]]. You can use information from tutorials, books, papers, experts in the fields, your friends, etc., however, make sure you can explain your work thoroughly. \\ 
 +- Follow the suggested file formats on the members'​ [[https://​oncinfo.org/​for_members|page]].
  
 ---- ----
 +
  
 ==== Question 2: ==== ==== Question 2: ====
Line 35: Line 36:
  
 \\ \\
-**B)** Let's define the //impact// of a set of genes to be the p-value of a log-rank test with the null hypothesis that when all of these genes are mutated together, the survival does not change. Write a function ''​most.impact()''​ that takes as input two ''​k1''​ and ''​n1''​ integers, and in the list of ''​n1''​ most mutated genes, finds the names of the ''​k1''​ genes with the best impact. Your function should return the names of the best ''​k1''​ genes, and also their impact. Run your function for ''​k1=3'',​ and ''​n1=3'',​ ''​10'',​ and ''​100''​. What the biological interpretation of your results?+**B)** Let's define the //impact// of a set of genes to be the p-value of a log-rank test with the null hypothesis that when all of these genes are mutated together, the survival does not change. Write a function ''​most.impact()''​ that takes as input two ''​k1''​ and ''​n1''​ integers, and in the list of ''​n1''​ most mutated genes, finds the names of the ''​k1''​ genes with the best impact. Your function should return the names of the best ''​k1''​ genes (i.e., the set of genes with the best log-rank p-value), and also their impact. Run your function for ''​k1=3'',​ and ''​n1=3'',​ ''​10'',​ and ''​100''​. What is the biological interpretation of your results?
  
 __Hint:__ Use the ''​utils::​c?​m?​n()''​ function, where you need to guess the question marks. __Hint:__ Use the ''​utils::​c?​m?​n()''​ function, where you need to guess the question marks.