Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
for_members [2023/07/13 18:24] – [General guidelines for conducting research in the Oncinfo Lab] adminfor_members [2024/04/30 22:14] (current) – [General guidelines for conducting research in the Oncinfo Lab] admin
Line 7: Line 7:
   - Be __passionate__  about science, otherwise a good programmer should easily find other jobs with a multiple-fold salary.   - Be __passionate__  about science, otherwise a good programmer should easily find other jobs with a multiple-fold salary.
   - __Work hard__, otherwise even a genius will not get to anywhere if they do not move.   - __Work hard__, otherwise even a genius will not get to anywhere if they do not move.
-  - Be __talented__. Nobody knows everything that is needed to do multidisciplinary research. You should be able to learn many things that you were not thought in courses. You often need to find novel solutions for small and big challenges that you face because you are the first person who is working on your specific study.+  - Be __talented__. Nobody knows everything that is needed to do multidisciplinary research. You should be able to learn many things that you were not taught in courses. You often need to find novel solutions for small and big challenges that you face because you are the first person who is working on your specific study.
   - Be __knowledgeable__  because we are not interested in reinventing the wheel. \\ The above items are ordered based on importance. The most critical one is **discipline**.   - Be __knowledgeable__  because we are not interested in reinventing the wheel. \\ The above items are ordered based on importance. The most critical one is **discipline**.
  
Line 16: Line 16:
   - Pass the online training **courses**  required by the University e.g., conflict of interest, safety, etc.   - Pass the online training **courses**  required by the University e.g., conflict of interest, safety, etc.
   - Ask for help [[https://codingkilledthecat.wordpress.com/2012/06/26/how-to-ask-for-programming-help/|professionally]], e.g., include the exact text of the error message.   - Ask for help [[https://codingkilledthecat.wordpress.com/2012/06/26/how-to-ask-for-programming-help/|professionally]], e.g., include the exact text of the error message.
-  - All experiments and analysis are done on **Unix**. That is a __real__  Unix system like Linux, OS X, etc., NOT a virtual machine. Start with a [[http://www.ee.surrey.ac.uk/Teaching/Unix/|tutorial]] for beginners or the [[http://nebc.nerc.ac.uk/downloads/courses/Bio-Linux/bl8_latest.pdf|introduction]] to Bio-Linux.+  - All experiments and analysis are done on **Unix**. That is a __real__  Unix system like Linux, OS X, etc., NOT a virtual machine. Start with a [[https://web.archive.org/web/20230815043906/http://www.ee.surrey.ac.uk/Teaching/Unix/|tutorial]] for beginners.
   - **[[http://www.r-project.org/|R]]** is primarily used for statistical analysis and other scripting purposes in Oncinfo Lab. [[https://www.coursera.org/course/rprog|This]] is a good online course on R which takes about 1 month to complete. A couple of days should be enough to read the [[http://cran.r-project.org/doc/manuals/R-intro.pdf|CRAN's]] good guide for starters to get the basic ideas, cover the [[http://www.r-tutor.com/r-introduction|introduction]] section from R-Tutorial, or learn R preliminaries [[https://rpubs.com/alinemati/learning_R_step_by_step|step by step]] when you have a goog [[https://cran.r-project.org/doc/contrib/Short-refcard.pdf|cheat]] sheet. Avoid using explicit [[https://www.datacamp.com/community/tutorials/tutorial-on-loops-in-r|loops]] and ''apply()''  in R when possible. [[https://www.datacamp.com/|DataCamp]] facilitates reading about R and running examples at the same time using a browser. Those who know R to some extent can use these books: Computational [[https://compgenomr.github.io/book/|Genomics]] with R, Bioinformatics with R {{:bioinformatics-r-cookbook.pdf|Cookbook}}, [[http://adv-r.had.co.nz/|Advanced]] R ([[https://adv-r.hadley.nz/|2nd]] Edition) by Hadley Wickham, [[https://booksoncode.com/articles/best-r-books-for-beginners|etc]]. to gradually learn more as they proceed in a project. The next step after learning R is to learn [[http://www.nature.com/nmeth/journal/v12/n2/full/nmeth.3252.html|Bioconductor]]. You can install most packages that we use in the lab by: \\ ''source("~/proj/alzheimer/code/utilities/makeOncinfoUt.R"); \\  OncinfoUt::call.libraries()''   - **[[http://www.r-project.org/|R]]** is primarily used for statistical analysis and other scripting purposes in Oncinfo Lab. [[https://www.coursera.org/course/rprog|This]] is a good online course on R which takes about 1 month to complete. A couple of days should be enough to read the [[http://cran.r-project.org/doc/manuals/R-intro.pdf|CRAN's]] good guide for starters to get the basic ideas, cover the [[http://www.r-tutor.com/r-introduction|introduction]] section from R-Tutorial, or learn R preliminaries [[https://rpubs.com/alinemati/learning_R_step_by_step|step by step]] when you have a goog [[https://cran.r-project.org/doc/contrib/Short-refcard.pdf|cheat]] sheet. Avoid using explicit [[https://www.datacamp.com/community/tutorials/tutorial-on-loops-in-r|loops]] and ''apply()''  in R when possible. [[https://www.datacamp.com/|DataCamp]] facilitates reading about R and running examples at the same time using a browser. Those who know R to some extent can use these books: Computational [[https://compgenomr.github.io/book/|Genomics]] with R, Bioinformatics with R {{:bioinformatics-r-cookbook.pdf|Cookbook}}, [[http://adv-r.had.co.nz/|Advanced]] R ([[https://adv-r.hadley.nz/|2nd]] Edition) by Hadley Wickham, [[https://booksoncode.com/articles/best-r-books-for-beginners|etc]]. to gradually learn more as they proceed in a project. The next step after learning R is to learn [[http://www.nature.com/nmeth/journal/v12/n2/full/nmeth.3252.html|Bioconductor]]. You can install most packages that we use in the lab by: \\ ''source("~/proj/alzheimer/code/utilities/makeOncinfoUt.R"); \\  OncinfoUt::call.libraries()''
   - Using **[[https://en.wikipedia.org/wiki/Emacs|Emacs]]**as a powerful, general purpose, text editor is [[https://robertamezquita.github.io/post/2017-04-07-my-emacs-setup/|encouraged]] ([[http://www2.lib.uchicago.edu/keith/tcl-course/emacs-tutorial.html|tutorial]]). In terminal, you can start it by typing emacs even in an SSH session. On Ubuntu you can simply install Emacs using Software Center, or by Package Synaptics, or by the following command: ''sudo apt-get install emacs''. On OS X, you can install [[https://emacsformacosx.com/|Emacs]] For MAC OS X, which is better than Aquamacs. A less recommended option is [[https://vigou3.gitlab.io/emacs-modified-macos/|Emacs Modified for macOS]], which supports [[https://ess.r-project.org/|ESS]] and [[https://www.gnu.org/software/auctex/|AUCTeX]]. You can customize your emacs by editing .emacs file. Feel free to copy some, but not all, commands from Habil's .emacs file for [[https://www.dropbox.com/s/pdt6fbho57k421d/emacs_UTosx2018|macOS]]. You can add these commands and automatically install the following packages by installing [[https://www.dropbox.com/s/d21azt9hg65sv3f/oncinfo.el?dl=0|oncinfo.el]] Emacs package, which is not tested well yet. As of 2019, Habil's favorite packages include: tabbar, tabbar-ruler, rainbow-delimiters, idle-highlight-in-visible-buffers-mode, auto-highlight-symbol, auto-complete-auctex, auto-complete, ess, and yaml. For guidelines on installing these packages and other emacs customizations, see the notes on [[:how_to|How to]] page.   - Using **[[https://en.wikipedia.org/wiki/Emacs|Emacs]]**as a powerful, general purpose, text editor is [[https://robertamezquita.github.io/post/2017-04-07-my-emacs-setup/|encouraged]] ([[http://www2.lib.uchicago.edu/keith/tcl-course/emacs-tutorial.html|tutorial]]). In terminal, you can start it by typing emacs even in an SSH session. On Ubuntu you can simply install Emacs using Software Center, or by Package Synaptics, or by the following command: ''sudo apt-get install emacs''. On OS X, you can install [[https://emacsformacosx.com/|Emacs]] For MAC OS X, which is better than Aquamacs. A less recommended option is [[https://vigou3.gitlab.io/emacs-modified-macos/|Emacs Modified for macOS]], which supports [[https://ess.r-project.org/|ESS]] and [[https://www.gnu.org/software/auctex/|AUCTeX]]. You can customize your emacs by editing .emacs file. Feel free to copy some, but not all, commands from Habil's .emacs file for [[https://www.dropbox.com/s/pdt6fbho57k421d/emacs_UTosx2018|macOS]]. You can add these commands and automatically install the following packages by installing [[https://www.dropbox.com/s/d21azt9hg65sv3f/oncinfo.el?dl=0|oncinfo.el]] Emacs package, which is not tested well yet. As of 2019, Habil's favorite packages include: tabbar, tabbar-ruler, rainbow-delimiters, idle-highlight-in-visible-buffers-mode, auto-highlight-symbol, auto-complete-auctex, auto-complete, ess, and yaml. For guidelines on installing these packages and other emacs customizations, see the notes on [[:how_to|How to]] page.
Line 23: Line 23:
   - All members should know about **central [[https://en.wikipedia.org/wiki/Central_dogma_of_molecular_biology|dogma]] of ****biology, **which is almost enough biological knowledge to start the majority of projects. Familiarity with some basic concepts such as [[https://en.wikipedia.org/wiki/Exon|exon]], intron, [[https://www.youtube.com/watch?v=CZeN-IgjYCo|sequencing]], etc. is helpful. Watch [[https://www.dnalc.org/resources/3d/|animations]] from DNA Learning Center.   - All members should know about **central [[https://en.wikipedia.org/wiki/Central_dogma_of_molecular_biology|dogma]] of ****biology, **which is almost enough biological knowledge to start the majority of projects. Familiarity with some basic concepts such as [[https://en.wikipedia.org/wiki/Exon|exon]], intron, [[https://www.youtube.com/watch?v=CZeN-IgjYCo|sequencing]], etc. is helpful. Watch [[https://www.dnalc.org/resources/3d/|animations]] from DNA Learning Center.
   - Any file or data on this wiki that has **restricted permissions**, such as some paper pdfs or drafts, should be kept confidential, and NOT be shared with nonmembers unless authorized by the PI.   - Any file or data on this wiki that has **restricted permissions**, such as some paper pdfs or drafts, should be kept confidential, and NOT be shared with nonmembers unless authorized by the PI.
-  - For future reference, please add the link to your presentations and drafts on the **[[https://oncinfo.org/drafts|drafts]] **page. At a minimum, please include: the author, the date, the audience or collaberator, and the subject.+  - For future reference, please add the link to your presentations and drafts on the **[[https://oncinfo.org/drafts|drafts]]**page. At a minimum, please include: the author, the date, the audience or collaberator, and the subject.
   - All members should read and follow **[[http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000424|Bill's]] guidelines**, and organize their files and folders accordingly and to some extent. Start by making a "~/proj" directory in your home folder that will eventually contain a subfolder for each project you are working on. Major subfolders must have a readme file for example to describe where the data is coming from. Abide to the "[[https://bitbucket.org/habilzare/template/src/master/|Rules]] for developing pipelines". In particular, your code folder must include a runall.R script that sources other scripts. Avoid sourcing scripts in other scripts except for the runall, functions, and libraries because then following and debugging the pipeline would be difficult. At the top of each individual script, please brielfy explain what it does, and sign it by writing <your name>, <YYYY-MM-DD>.   - All members should read and follow **[[http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000424|Bill's]] guidelines**, and organize their files and folders accordingly and to some extent. Start by making a "~/proj" directory in your home folder that will eventually contain a subfolder for each project you are working on. Major subfolders must have a readme file for example to describe where the data is coming from. Abide to the "[[https://bitbucket.org/habilzare/template/src/master/|Rules]] for developing pipelines". In particular, your code folder must include a runall.R script that sources other scripts. Avoid sourcing scripts in other scripts except for the runall, functions, and libraries because then following and debugging the pipeline would be difficult. At the top of each individual script, please brielfy explain what it does, and sign it by writing <your name>, <YYYY-MM-DD>.
-  - Your code and documents should be stored in a **Bitbucket**  repository like [[https://bitbucket.org/habilzare/alzheimer/src/master/|https://bitbucket.org/habilzare/alzheimer/src/master/]]. Sign up for an [[https://bitbucket.org/account/signup/|account]] and add your photo. Do NOT sign in using your Google account. Only then, send your username to Habil. If you are new to Bitbucket, spend an hour on the [[https://confluence.atlassian.com/bitbucket/tutorial-learn-bitbucket-with-git-759857287.html|tutorial]]. Taking [[https://guides.co/g/bitbucket-101/11146|Bitbucket 101]] is NOT needed for beginners. You can [[https://confluence.atlassian.com/bitbucket/use-the-ssh-protocol-with-bitbucket-cloud-221449711.html|avoid]] having to manually type a password each time you pull using ssh. To add a key, click on settings at the top right corner of Bitbucket page, SSH keys, Add key. Use ssh to clone a repository, NOT https. Do NOT mess up with other's git folders on the cluster. You should //only//  clone, pull, and push in your own home or work directory. Do NOT skip this step. Before changing anything in a repository, read and abide to the conventions described in the main readme file.+  - Your code and documents should be stored in a **Bitbucket**  repository like [[https://bitbucket.org/habilzare/alzheimer/src/master/|https://bitbucket.org/habilzare/alzheimer/src/master/]]. Sign up for an [[https://bitbucket.org/account/signup/|account]] and add your photo. Do NOT sign in using your Google account. Only then, send your username and the corresponding email to Habil. If you are new to Bitbucket, spend an hour on the [[https://confluence.atlassian.com/bitbucket/tutorial-learn-bitbucket-with-git-759857287.html|tutorial]]. Taking [[https://guides.co/g/bitbucket-101/11146|Bitbucket 101]] is NOT needed for beginners. You can [[https://confluence.atlassian.com/bitbucket/use-the-ssh-protocol-with-bitbucket-cloud-221449711.html|avoid]] having to manually type a password each time you pull using ssh. To add a key, click on settings at the top right corner of Bitbucket page, SSH keys, Add key. Use ssh to clone a repository, NOT https. Do NOT mess up with other's git folders on the cluster. You should //only//  clone, pull, and push in your own home or work directory. Do NOT skip this step. Before changing anything in a repository, read and abide to the conventions described in the main readme file.
   - Do NOT use **space in the file or folder names**. Do NOT include binary files such as png, pdf, RData, etc. in a Bitbucket repository unless on an exceptional basis. Instead, use [[https://explainshell.com/explain?cmd=rsync+-avz|e.g.,]]''rsync -avz -e ssh <usrname>@ls6.tacc.utexas.edu''  or ''scp ''to transfer files between the cluster and your computer, and document the exact paths in a readme file in the corresponding folder. Add the readme file to the repository.   - Do NOT use **space in the file or folder names**. Do NOT include binary files such as png, pdf, RData, etc. in a Bitbucket repository unless on an exceptional basis. Instead, use [[https://explainshell.com/explain?cmd=rsync+-avz|e.g.,]]''rsync -avz -e ssh <usrname>@ls6.tacc.utexas.edu''  or ''scp ''to transfer files between the cluster and your computer, and document the exact paths in a readme file in the corresponding folder. Add the readme file to the repository.
-  - If you want to use **TACC**  resources, you first [[https://portal.tacc.utexas.edu/account-request|create]] an account, and then ask Habil to add you to a project. Source ''/work/03270/zare/Install/oncinfo_settings''  in your .bashrc or other bash scripts so that you do not need to install the software that we often need and are already installed by other lab members. We usually use Lonestar and Maverick for computing, and we archive large data on Ranch based on [[https://docs.google.com/document/d/17VkB7_HQUq7yeSr906Qlh7q8TlFX5nvP5a1F9csXBGY/edit|this]] protocol. A simple test for running a job on the Lonestar cluster is the following. Look at their user [[https://portal.tacc.utexas.edu/user-guides/lonestar5|guide]] and [[https://srcc.stanford.edu/sge-slurm-conversion|this]] table of commands for more details. \\ ''$ ssh <username>@ls6.tacc.utexas.edu \\  $ cd ~zare \\  login1.ls6(1099)$ cat ./test.sh'' \\ ''login1.ls6(1099)$ sbatch ./test.sh'' \\ You can monitor your jobs using ''squeue -u <usrname>''. The output will be saved in the ''tests''  subfolder. If there are multiple files in this folder, look at the newest one. \\ The above command will submit the job to the development queue. If you want to submit a job to the normal queue, you can do the following: \\ ''login1.ls6(1099)$ sbatch -p normal -n 1 -t 2 ./test.sh'' \\ Before submitting many jobs, estimate the time and memory by submitting a single job or preferably, running your code on toy data. If you think running a complete job takes more than 10 minutes on the cluster, before submitting more similar jobs, let Habil check your code to make sure we do not miss any easy [[https://bitbucket.org/habilzare/alzheimer/commits/250cacf92ef6b2886973dc609c229032b42b2234|parallelization]]. Familiarize yourself with [[https://docs.google.com/presentation/d/1TFjB16XmLk2Xo4qHJgGe1VfnIUonw9cIJmU1xCZxyL0/edit#slide=id.g242153ecf22_0_60|Docker]]. Use ''habilzare/oncinfo:oncinfo-<version>''  and //only//  modify ''habilzare/oncinfo:oncinfo-dev-<version>''+  - If you want to use **TACC**  resources, you first [[https://portal.tacc.utexas.edu/account-request|create]] an account, and then ask Habil to add you to a project. Source ''/work/03270/zare/Install/oncinfo_settings''  in your .bashrc or other bash scripts so that you do not need to install the software that we often need and are already installed by other lab members. We usually use Lonestar and Maverick for computing, and we archive large data on Ranch based on [[https://docs.google.com/document/d/17VkB7_HQUq7yeSr906Qlh7q8TlFX5nvP5a1F9csXBGY/edit|this]] protocol. A simple test for running a job on the Lonestar cluster is the following. Look at their user [[https://portal.tacc.utexas.edu/user-guides/lonestar5|guide]] and [[https://srcc.stanford.edu/sge-slurm-conversion|this]] table of commands for more details. \\ ''$ ssh <username>@ls6.tacc.utexas.edu \\  $ cd ~zare \\  login1.ls6(1099)$ cat ./test.mpi'' \\ ''login1.ls6(1099)$ sbatch ./test.mpi'' \\ You can monitor your jobs using ''squeue -u <usrname>''. The output will be saved in the ''~/temp''  subfolder. If there are multiple files in this folder, look at the newest one. \\ The above command will submit the job to the development queue. If you want to submit a job to the normal queue, you can do the following: \\ ''login1.ls6(1099)$ sbatch -p normal -n 1 -t 2 ./test.mpi'' \\ [[https://tacc-cloud.readthedocs.io/projects/agave/en/latest/index.html|Tapis]] is an //advanced//  optional patform for submitting jobs from your local computer, not recommended for beginners. Before submitting many jobs, estimate the time and memory by submitting a single job or preferably, running your code on toy data. If you think running a complete job takes more than 10 minutes on the cluster, before submitting more similar jobs, let Habil check your code to make sure we do not miss any easy [[https://bitbucket.org/habilzare/alzheimer/commits/250cacf92ef6b2886973dc609c229032b42b2234|parallelization]]. Familiarize yourself with [[https://docs.google.com/presentation/d/1TFjB16XmLk2Xo4qHJgGe1VfnIUonw9cIJmU1xCZxyL0/edit#slide=id.g242153ecf22_0_60|Docker]], which you can use to test your code locally before runing it on the cluster[[https://bitbucket.org/habilzare/genetwork/src/master/code/docker/readme.txt|Use]]''habilzare/oncinfo:oncinfo-<version>''  and modify //only//''habilzare/oncinfo:oncinfo-dev-<version>''
-  - Every one should have a photo and their updated CV in pdf format on their personal page. {{:cv_template.zip|This}}  is an optional LaTeX template. The permission of any lab notebook (lano) should be set to "hidden"and it is important that they be updated EVERY day. [[https://www.dokuwiki.org/dokuwiki|DokuWiki]] provides us with two edit modes: ckg and DW. Use the one that is more convenient for you. Write your posts in anti-chronological order so that the newest post comes at the top. For facilitating future reference, avoid sending data as attachments. Instead, upload files to your lano and link to them where needed.+  - Every one should have a photo and their updated CV in pdf format on their personal page. {{:wiki:public:cv_template.zip|This}}  is an optional LaTeX template. The permission of any lab notebook (lano) should be set to "hidden"and it is important that they be updated EVERY day. [[https://www.dokuwiki.org/dokuwiki|DokuWiki]] provides us with two edit modes: ckg and DW. Use the one that is more convenient for you. Write your posts in anti-chronological order so that the newest post comes at the top. For facilitating future reference, avoid sending data as attachments. Instead, upload files to your lano and link to them where needed.
   - You can install **Google Scholar [[https://chrome.google.com/webstore/detail/google-scholar-button/ldipcbpaocekfooobnbcddclnhejkcpn?hl=en|Button]]**add-on for an easier way of searching Google Scholar. You select the paper title and then click on the little blue icon on the top right corner. For any paper which you want to cite on the lab wiki, find it on Google Scholar, click on "More>Cite" and copy the MLA format. Also, use [[https://gsuite.google.com/marketplace/app/paperpile/894076725911|Paperpile]] for easy citation in Google doc, and Math [[https://gsuite.google.com/marketplace/app/math_equations/825973477142|Equations]] for writing and manipulating equations on Google presentations.   - You can install **Google Scholar [[https://chrome.google.com/webstore/detail/google-scholar-button/ldipcbpaocekfooobnbcddclnhejkcpn?hl=en|Button]]**add-on for an easier way of searching Google Scholar. You select the paper title and then click on the little blue icon on the top right corner. For any paper which you want to cite on the lab wiki, find it on Google Scholar, click on "More>Cite" and copy the MLA format. Also, use [[https://gsuite.google.com/marketplace/app/paperpile/894076725911|Paperpile]] for easy citation in Google doc, and Math [[https://gsuite.google.com/marketplace/app/math_equations/825973477142|Equations]] for writing and manipulating equations on Google presentations.
   - Create a Nature [[https://idp.nature.com/register/natureuser?redirect_uri=https://www.nature.com/my-account/alerts|account]] for yourself. To get a monthly list of published papers in Nature Methods, subscribe to the corresponding **alert**  . This can help you get a sense of where the field is going. You can also create an [[https://scholar.google.com/intl/en/scholar/help.html#alerts|alert]] on the Google Scholar to get regular updates on what is being published on the specific topic of your study.   - Create a Nature [[https://idp.nature.com/register/natureuser?redirect_uri=https://www.nature.com/my-account/alerts|account]] for yourself. To get a monthly list of published papers in Nature Methods, subscribe to the corresponding **alert**  . This can help you get a sense of where the field is going. You can also create an [[https://scholar.google.com/intl/en/scholar/help.html#alerts|alert]] on the Google Scholar to get regular updates on what is being published on the specific topic of your study.
-  - **Code style**  in Oncinfo lab: We follow Hadley Wickhams’s R Style [[http://adv-r.had.co.nz/Style.html|Guide]] unless another convention is mentioned below. The goal is to include as much code as possible on 1 page so that it is easier to skim while keeping the overall structure such as proper indentation. Use [[https://contributions.bioconductor.org/r-code.html?q=inde#indentation|2 spaces]] (not tabs) for indentation. Like writing English texts, organize each script in small paragraphs and avoid extra white lines. Give each paragraph a short title in a comment. \\  When writing R code, use "''x <- 5''" for assigning a value to a variable. Do NOT use "''x = 5''" or "''x<-5''". We use**[[https://en.wikipedia.org/wiki/Camel_case|camelCase]]**in R, so do NOT use underscore, '_', and dot, '.', in variable or function names. E.g., instead of "''inverse_of''", use "''inverseOf''" as a variable name or function name so that you can select it by 1 click. Almost all functions must return a list so that extending them will be easy. Use "''##''" for comments NOT a single "''#''". Boolean flags and variable names start with and as [[https://en.wikipedia.org/wiki/Auxiliary_verb|auxiliary]] verb like ''do'', ''is'', or ''has'', e.g., ''doPlot'', ''isSingle'', ''hasMeta'', etc. Write the name of the loaded object in a comment in front of ''load()''. \\ Avoid long lines of code. Most lines should be < 90 characters, and all lines must be <100 characters . Thus, do NOT include space when using ''=''  in function calls. Good example: ''average <- mean(feet[ ,"real"]/12+inches, na.rm=TRUE) ## Spaces only around "<-" and after ","''. The space in "''[ ,''" is OK, which refers to all rows. It is better to place a space before the parenthesis after "''if (''", "''for (''", and alike, but do NOT use space between a function name and ''"("''  e.g., write ''plot(Data)''. When the line is long, it usually means you need to extract some of it and define a new variable right above that line. \\ Data structures in R can be ordered from simple to complex as follows: number , vector, matrix, and list. Always use the simplest possible data structure, e.g., do not use a list when you can use a matrix and do not use a matrix with one column when you can use a named vector. The reson is that R has more tools for simpler data structures, e.g., ''sum''  and ''paste''  work on vectors, but not lists. Use "''$''" to __access__  elements of a list like ''A1 <- list1$matrixA'', but NOT to access a column of a data frame. To __add or modify__  a new element to a list use double square brackets not "''$''" like {{:wiki:public:screen_shot_2021-05-25_at_9.00.51_pm.png?linkonly|this}} . Do not use pipes in R (i.e., ''%>%'') due to its [[https://www.datacamp.com/community/tutorials/pipe-r-tutorial?irclickid=yFW3tDVC0xyLW0sz410s1Tw1UkB33TWNUTXcSg0&irgwc=1&=|drawbacks]].+  - **Code style**  in Oncinfo lab: We follow Hadley Wickhams’s R Style [[http://adv-r.had.co.nz/Style.html|Guide]] unless another convention is mentioned below. The goal is to include as much code as possible on 1 page so that it is easier to skim while keeping the overall structure such as proper indentation. Use [[https://contributions.bioconductor.org/r-code.html?q=inde#indentation|2 spaces]] (not tabs) for indentation. Like writing English texts, organize each script in small paragraphs and avoid extra white lines. Give each paragraph a short title in a comment. \\  When writing R code, use "''x <- 5''" for assigning a value to a variable. Do NOT use "''x = 5''" or "''x<-5''". We use**[[https://en.wikipedia.org/wiki/Camel_case|camelCase]]**in R, so do NOT use underscore, '_', and dot, '.', in variable or function names. E.g., instead of "''inverse_of''", use "''inverseOf''" as a variable name or function name so that you can select it by 1 click. Almost all functions must return a list so that extending them will be easy. Use "''##''" for comments NOT a single "''#''". Boolean flags and variable names start with and as [[https://en.wikipedia.org/wiki/Auxiliary_verb|auxiliary]] verb like ''do'', ''is'', or ''has'', e.g., ''doPlot'', ''isSingle'', ''hasMeta'', etc. Write the name of the loaded object in a comment in front of ''load()''. \\ Avoid long lines of code. Most lines should be < 90 characters, and all lines must be <100 characters . Thus, do NOT include space when using ''=''  in function calls. Good example: ''average <- mean(feet[ ,"real"]/12+inches, na.rm=TRUE) ## Spaces only around "<-" and after ","''. The space in "''[ ,''" is OK, which refers to all rows. It is better to place a space before the parenthesis after "''if (''", "''for (''", and alike, but do NOT use space between a function name and ''"("''  e.g., write ''plot(Data)''Write the FULL name of arguments when calling a function and do NOT reply on their order, which may change in the future. \\ When the line is long, it usually means you need to extract some of it and define a new variable right above that line. \\ Data structures in R can be ordered from simple to complex as follows: number , vector, matrix, and list. Always use the simplest possible data structure, e.g., do not use a list when you can use a matrix and do not use a matrix with one column when you can use a named vector. The reson is that R has more tools for simpler data structures, e.g., ''sum''  and ''paste''  work on vectors, but not lists. To __add or modify__  a new element to a list use double square brackets like {{:wiki:public:screen_shot_2021-05-25_at_9.00.51_pm.png?linkonly|this}} , not "''$''". You can use "''$''" to __access__  elements of a list like ''A1 <- list1$matrixA''  (with some [[https://oncinfo.org/mohsens_lab_notebook#section20240211|caveates]]), but NOT to access a column of a data frame. Do not use [[https://www.datacamp.com/tutorial/pipe-r-tutorial|pipes]] in R (i.e., ''%>%''  and ''|>'') due to its [[https://stackoverflow.com/questions/38880352/should-i-avoid-programming-packages-with-pipe-operators|drawbacks]] outlined by Hadley [[https://r4ds.had.co.nz/pipes.html#when-not-to-use-the-pipe|Wickham]].
   - **Never copy code**, instead generalize your code and write functions. If you are copying more than a line of code, most likely you are doing something wrong and WETting your code. Follow the [[https://en.wikipedia.org/wiki/Don't_repeat_yourself|DRY]] principle.   - **Never copy code**, instead generalize your code and write functions. If you are copying more than a line of code, most likely you are doing something wrong and WETting your code. Follow the [[https://en.wikipedia.org/wiki/Don't_repeat_yourself|DRY]] principle.
   - In your code, **avoid using one letter variables**  such as i or a because they are very hard to track in the editor. Instead use ind or i1. Also, your variable name must be different from built-in functions such as ls in R.   - In your code, **avoid using one letter variables**  such as i or a because they are very hard to track in the editor. Instead use ind or i1. Also, your variable name must be different from built-in functions such as ls in R.
Line 43: Line 43:
   - As employees of UT Health, we can get facilitated appointments with UT Health **primary care physicians**  (call: 210-450-9090), or alternatively use [[https://mdlnext.mdlive.com/home?matchtype=e&network=g&device=c&keyword=mdlive&adposition=&gclid=CjwKCAjwy_aUBhACEiwA2IHHQAXl6Lx_FzhKPaDyjCY5ry2vMwXnnqn7lFwcN4qM_rlQdqf3Be7V1BoCw60QAvD_BwE|MDLIVE]] to be virtually visited by a physician.   - As employees of UT Health, we can get facilitated appointments with UT Health **primary care physicians**  (call: 210-450-9090), or alternatively use [[https://mdlnext.mdlive.com/home?matchtype=e&network=g&device=c&keyword=mdlive&adposition=&gclid=CjwKCAjwy_aUBhACEiwA2IHHQAXl6Lx_FzhKPaDyjCY5ry2vMwXnnqn7lFwcN4qM_rlQdqf3Be7V1BoCw60QAvD_BwE|MDLIVE]] to be virtually visited by a physician.
   - If you are considering ultimately getting** jobs**  in computational biology or bioinformatics, have a look at postings at the [[http://bioinformatics.org|bioinformatics.org]] website __within the first week__  after joining the lab. For academic positions, see the Nature Jobs, Science Careers, and [[https://docs.google.com/document/d/1tRVpO0eFzYfl0f95X_X8YvE9DeQVX3vYBkmw4smRJLU/edit#|other]] websites. Read the articles on "[[http://oncinfo.org/how_to|How to]] rescue US biomedical research from its systemic flaws?" if you are, or want to be, a PhD student.   - If you are considering ultimately getting** jobs**  in computational biology or bioinformatics, have a look at postings at the [[http://bioinformatics.org|bioinformatics.org]] website __within the first week__  after joining the lab. For academic positions, see the Nature Jobs, Science Careers, and [[https://docs.google.com/document/d/1tRVpO0eFzYfl0f95X_X8YvE9DeQVX3vYBkmw4smRJLU/edit#|other]] websites. Read the articles on "[[http://oncinfo.org/how_to|How to]] rescue US biomedical research from its systemic flaws?" if you are, or want to be, a PhD student.
-  - If you want to use **ROSMAP**  data, please create a Synapse [[https://www.synapse.org/#!RegisterAccount:0|account]], add your information to {{:wiki:second-ad-knowledge-portal-controlled-access-duc-march2022-v7.3-signed.pdf|this}}  file, and upload it again to Oncinfo without changing the file name. Let Habil know so that he uploads it on the Synapse [[https://help.adknowledgeportal.org/apd/Data-Use-Certificates.2623373330.html|website]]. Then, accept Terms of Use through this [[https://www.synapse.org/#!AccessRequirements:ID=syn2910256&TYPE=ENTITY|link]].+  - If you want to use **ROSMAP**  data, please create a Synapse [[https://www.synapse.org/#!RegisterAccount:0|account]], add your information to {{:second-ad-knowledge-portal-controlled-access-duc-march2022-v7.3-signed.pdf|this}}  file, and upload it again to Oncinfo without changing the file name. Let Habil know so that he uploads it on the Synapse [[https://help.adknowledgeportal.org/apd/Data-Use-Certificates.2623373330.html|website]]. Then, accept Terms of Use through this [[https://www.synapse.org/#!AccessRequirements:ID=syn2910256&TYPE=ENTITY|link]].
  
  
 ==== Some references ==== ==== Some references ====
  
-  - Two machine learning bibles, which summarize important topics in the field up to 2005: Bishop ({{:bishop-pattern_recongnition_and_machine_learning-1.pdf|1}},{{:bishop-pattern_recongnition_and_machine_learning-2.pdf|2}}  ) and [[http://statweb.stanford.edu/~tibs/ElemStatLearn/download.html|Hasite et al.]].+  - Two machine learning bibles, which summarize important topics in the field up to 2005: Bishop ({{:bishop-pattern_recongnition_and_machine_learning-1.pdf|1}},{{:bishop-pattern_recongnition_and_machine_learning-2.pdf|2}}  ) and [[http://statweb.stanford.edu/~tibs/ElemStatLearn/download.html|Hasite et al.]]. Anwar's cheat [[https://medium.com/swlh/cheat-sheets-for-machine-learning-interview-topics-51c2bc2bab4f|sheets]]. A [[https://docs.google.com/spreadsheets/d/1AK8lqS-ztMhh8YoOaQ7ScIZmabrQ5AFxAyXKwYWiT04/edit#gid=0|list]] of some of the best courses in ML.
   - An old list of computational biology [[https://docs.google.com/document/d/1fxqgQgsxf6Xd-8p0DUTyOGkeWNI5REW4zk1FXTxi2_I/edit|books]]. If you would like to have a hard copy of any these books or other books useful to your training and research, even in areas like psychology and management, please let Habil know. Molecular Biology of the Cell, by [[https://www.amazon.com/Molecular-Biology-Sixth-Bruce-Alberts/dp/0815345240|Alberts]] et al, is a good self-contained book starting from basic biology concepts like DNA and ending in describing complex pathways and mechanisms like immune system ({{:screen_shot_2021-03-29_at_1.05.18_pm.png?linkonly|contents}}, {{:molecular_biology_of_the_cell_6th_editio.pdf|pdf}}  ).   - An old list of computational biology [[https://docs.google.com/document/d/1fxqgQgsxf6Xd-8p0DUTyOGkeWNI5REW4zk1FXTxi2_I/edit|books]]. If you would like to have a hard copy of any these books or other books useful to your training and research, even in areas like psychology and management, please let Habil know. Molecular Biology of the Cell, by [[https://www.amazon.com/Molecular-Biology-Sixth-Bruce-Alberts/dp/0815345240|Alberts]] et al, is a good self-contained book starting from basic biology concepts like DNA and ending in describing complex pathways and mechanisms like immune system ({{:screen_shot_2021-03-29_at_1.05.18_pm.png?linkonly|contents}}, {{:molecular_biology_of_the_cell_6th_editio.pdf|pdf}}  ).
   - [[https://www.biostars.org/|Biostars]] is a good forum, similar to Stack Overflow in structure, but focused on bioinformatics and Computational Biology.   - [[https://www.biostars.org/|Biostars]] is a good forum, similar to Stack Overflow in structure, but focused on bioinformatics and Computational Biology.
Line 56: Line 56:
   - [[http://stephenturner.us/edu.html|Lists 1]] and [[https://www.r-bloggers.com/2013/04/list-of-bioinformatics-workshops-and-training-resources/|2]] of bioinformatics workshops.   - [[http://stephenturner.us/edu.html|Lists 1]] and [[https://www.r-bloggers.com/2013/04/list-of-bioinformatics-workshops-and-training-resources/|2]] of bioinformatics workshops.
   - A 5-minutes introduction to next-generation sequencing [[https://www.youtube.com/watch?annotation_id=annotation_228575861&feature=iv&src_vid=womKfikWlxM&v=fCd6B5HRaZ8|video]].   - A 5-minutes introduction to next-generation sequencing [[https://www.youtube.com/watch?annotation_id=annotation_228575861&feature=iv&src_vid=womKfikWlxM&v=fCd6B5HRaZ8|video]].
 +
  
 ==== Fun stuff ==== ==== Fun stuff ====
Line 61: Line 62:
   - Inner [[https://www.youtube.com/watch?v=yKW4F0Nu-UY|Life]] Of A Cell.   - Inner [[https://www.youtube.com/watch?v=yKW4F0Nu-UY|Life]] Of A Cell.
   - The Dark Age of the [[https://www.youtube.com/watch?v=iyc3bDFk84w&src_vid=uabNtlLfYyU&feature=iv&annotation_id=annotation_558059|Universe]], a good visualization of the big bang   - The Dark Age of the [[https://www.youtube.com/watch?v=iyc3bDFk84w&src_vid=uabNtlLfYyU&feature=iv&annotation_id=annotation_558059|Universe]], a good visualization of the big bang
-  - [[http://blogs.discovermagazine.com/d-brief/2015/11/18/pigeon-pathologists-know-cancer-when-they-see-it/#.VlNbrWSrQUH|Pigeons]] can learn to diagnose breast cancer with [[http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0141357|99]]% accuracy.+  - [[https://www.theguardian.com/society/2015/nov/19/pigeons-can-identify-cancerous-tissue-on-x-rays-study-finds|Pigeons]] can learn to diagnose breast cancer with [[http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0141357|99]]% accuracy.
   - [[https://www.youtube.com/watch?v=jAhjPd4uNFY|CRISPR]] is an evolutionary tool for editing DNA, which [[https://www.youtube.com/watch?v=1BXYSGepx7Q|reduces]] the time and cost of genome modification by an order of magnitude.   - [[https://www.youtube.com/watch?v=jAhjPd4uNFY|CRISPR]] is an evolutionary tool for editing DNA, which [[https://www.youtube.com/watch?v=1BXYSGepx7Q|reduces]] the time and cost of genome modification by an order of magnitude.