Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
for_members [2019/10/26 16:39] – [General guidelines for conducting research in the Oncinfo Lab] adminfor_members [2019/12/13 17:05] – [General guidelines for conducting research in the Oncinfo Lab] admin
Line 8: Line 8:
   - [[http://www.r-project.org/|R]] is primarily used for statistical analysis and other scripting purposes in Oncinfo Lab. [[https://www.coursera.org/course/rprog|This]] is a good online course on R which takes about 1 month to complete. A couple of days should be enough to read [[http://cran.r-project.org/doc/manuals/R-intro.pdf|this]] good guide for starters to get the basis ideas, or cover the [[http://www.r-tutor.com/r-introduction|introduction]] section from R-Tutorial. [[https://www.datacamp.com/|DataCamp]] facilitates reading about R and running examples at the same time using a browser . Those who know R to some extend can use the book Bioinformatics with R {{:bioinformatics-r-cookbook.pdf|Cookbook}}  or [[http://adv-r.had.co.nz/|Advanced]] R by Hadley Wickham to gradually learn more as they proceed in a project. The next step after learning R is to learn [[http://www.nature.com/nmeth/journal/v12/n2/full/nmeth.3252.html|Bioconductor]] .   - [[http://www.r-project.org/|R]] is primarily used for statistical analysis and other scripting purposes in Oncinfo Lab. [[https://www.coursera.org/course/rprog|This]] is a good online course on R which takes about 1 month to complete. A couple of days should be enough to read [[http://cran.r-project.org/doc/manuals/R-intro.pdf|this]] good guide for starters to get the basis ideas, or cover the [[http://www.r-tutor.com/r-introduction|introduction]] section from R-Tutorial. [[https://www.datacamp.com/|DataCamp]] facilitates reading about R and running examples at the same time using a browser . Those who know R to some extend can use the book Bioinformatics with R {{:bioinformatics-r-cookbook.pdf|Cookbook}}  or [[http://adv-r.had.co.nz/|Advanced]] R by Hadley Wickham to gradually learn more as they proceed in a project. The next step after learning R is to learn [[http://www.nature.com/nmeth/journal/v12/n2/full/nmeth.3252.html|Bioconductor]] .
   - Using [[https://en.wikipedia.org/wiki/Emacs|Emacs]] as a powerful, general purpose, text editor is [[https://robertamezquita.github.io/post/2017-04-07-my-emacs-setup/|encouraged]] ([[http://www2.lib.uchicago.edu/keith/tcl-course/emacs-tutorial.html|tutorial]]). In terminal, you can start it by typing emacs even in an SSH session. On Ubuntu you can simply install Emacs using Software Center, or by Package Synaptics, or by the following command: ''sudo apt-get install emacs''. On OS X, you can install [[https://emacsformacosx.com/|Emacs]] For MAC OS X, which is better than Aquamacs. A less recommended option is [[https://vigou3.gitlab.io/emacs-modified-macos/|Emacs Modified for macOS]], which supports [[https://ess.r-project.org/|ESS]] and [[https://www.gnu.org/software/auctex/|AUCTeX]]. You can customize your emacs by editing .emacs file. Feel free to copy some, but not all, commands from Habil's .emacs file for [[https://www.dropbox.com/s/pdt6fbho57k421d/emacs_UTosx2018|macOS]]. As of 2019, his favorite packages include: tabbar, tabbar-ruler, rainbow-delimiters, idle-highlight-in-visible-buffers-mode, auto-highlight-symbol, auto-complete-auctex, auto-complete, and ess.   - Using [[https://en.wikipedia.org/wiki/Emacs|Emacs]] as a powerful, general purpose, text editor is [[https://robertamezquita.github.io/post/2017-04-07-my-emacs-setup/|encouraged]] ([[http://www2.lib.uchicago.edu/keith/tcl-course/emacs-tutorial.html|tutorial]]). In terminal, you can start it by typing emacs even in an SSH session. On Ubuntu you can simply install Emacs using Software Center, or by Package Synaptics, or by the following command: ''sudo apt-get install emacs''. On OS X, you can install [[https://emacsformacosx.com/|Emacs]] For MAC OS X, which is better than Aquamacs. A less recommended option is [[https://vigou3.gitlab.io/emacs-modified-macos/|Emacs Modified for macOS]], which supports [[https://ess.r-project.org/|ESS]] and [[https://www.gnu.org/software/auctex/|AUCTeX]]. You can customize your emacs by editing .emacs file. Feel free to copy some, but not all, commands from Habil's .emacs file for [[https://www.dropbox.com/s/pdt6fbho57k421d/emacs_UTosx2018|macOS]]. As of 2019, his favorite packages include: tabbar, tabbar-ruler, rainbow-delimiters, idle-highlight-in-visible-buffers-mode, auto-highlight-symbol, auto-complete-auctex, auto-complete, and ess.
-  - Using proprietary file formats is not professional when you are sharing information (e.g., your CV) with others. The pdf and png formats are OK and portable. Use Google Docs instead of .docx, and Google Presentation instead of .ppt.+  - Using proprietary file formats is not professional when you are sharing information (e.g., your CV) with others. The pdf and png formats are OK and portable. Use Google Docs instead of .docx, Google Presentation instead of .ppt, .zip or tar.gz instead of rar, etc.
   - [[https://www.youtube.com/watch?v=WsofH466lqk|This]] video illustrates transcription ([[https://en.wikipedia.org/wiki/Transcription_(genetics)|wikipedia]], [[https://www.youtube.com/watch?v=5MfSYnItYvg|video 2]]), more videos on [[https://www.youtube.com/watch?v=OEWOZS_JTgk|gene expression]] ([[https://en.wikipedia.org/wiki/Gene_expression|wikipedia]]), [[https://www.youtube.com/watch?v=TfYf_rPWUdY|translation]] ([[https://www.youtube.com/watch?v=5bLEDd-PSTQ|detailed]]), etc.   - [[https://www.youtube.com/watch?v=WsofH466lqk|This]] video illustrates transcription ([[https://en.wikipedia.org/wiki/Transcription_(genetics)|wikipedia]], [[https://www.youtube.com/watch?v=5MfSYnItYvg|video 2]]), more videos on [[https://www.youtube.com/watch?v=OEWOZS_JTgk|gene expression]] ([[https://en.wikipedia.org/wiki/Gene_expression|wikipedia]]), [[https://www.youtube.com/watch?v=TfYf_rPWUdY|translation]] ([[https://www.youtube.com/watch?v=5bLEDd-PSTQ|detailed]]), etc.
   - All members should know about central [[https://en.wikipedia.org/wiki/Central_dogma_of_molecular_biology|dogma]] of biology which is almost enough biological knowledge to start the majority of projects [[:dogma.pdf?media=dogma.pdf|pdf]]]. Familiarity with some basic concepts such as [[https://en.wikipedia.org/wiki/Exon|exon]], intron, etc. is helpful. Watch [[https://www.dnalc.org/resources/3d/|animations]] from DNA Learning Center.   - All members should know about central [[https://en.wikipedia.org/wiki/Central_dogma_of_molecular_biology|dogma]] of biology which is almost enough biological knowledge to start the majority of projects [[:dogma.pdf?media=dogma.pdf|pdf]]]. Familiarity with some basic concepts such as [[https://en.wikipedia.org/wiki/Exon|exon]], intron, etc. is helpful. Watch [[https://www.dnalc.org/resources/3d/|animations]] from DNA Learning Center.
Line 14: Line 14:
   - For future reference, please add the link to your presentations and drafts on the [[https://oncinfo.org/drafts|drafts]] page. At a minimum, please include: the author, the date, the audience, and the subject.   - For future reference, please add the link to your presentations and drafts on the [[https://oncinfo.org/drafts|drafts]] page. At a minimum, please include: the author, the date, the audience, and the subject.
   - All members should read and follow [[http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000424|Bill's]] guidelines, and organize their files and folders accordingly and to some extend. Start by making a "~/proj" directory in your home folder that will eventually contain a subfolder for each project you are working on. Major subfolders must have a readme file for example to describe where the data is coming from. Your code folder must include a runall.R script that sources other scripts. Avoid sourcing scripts in other scripts except for the runall because then following and debugging the pipeline would be difficult.   - All members should read and follow [[http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000424|Bill's]] guidelines, and organize their files and folders accordingly and to some extend. Start by making a "~/proj" directory in your home folder that will eventually contain a subfolder for each project you are working on. Major subfolders must have a readme file for example to describe where the data is coming from. Your code folder must include a runall.R script that sources other scripts. Avoid sourcing scripts in other scripts except for the runall because then following and debugging the pipeline would be difficult.
-  - Your code and documents should be stored in a Bitbucket repository like [[https://bitbucket.org/habilzare/genetwork|https://bitbucket.org/habilzare/genetwork]]. Sign up for an [[https://bitbucket.org/account/signup/|account]] and add your photo. Do NOT sign in using your Google account. Only then, send your username to Habil. If you are new to Bitbucket, take [[https://guides.co/g/bitbucket-101/11146|Bitbucket 101]]. You can [[https://confluence.atlassian.com/bitbucket/use-the-ssh-protocol-with-bitbucket-cloud-221449711.html|avoid]] having to manually type a password each time you pull using ssh. To add a key, click on your photo at the top right corner of Bitbucket page, Bitbucket settings, SSH keys, Add key. This trick is not appropriate for TACC clusters because we should not change our .ssh folder there. On the cluster, use https to clone instead of ssh. Do NOT mess up with other's git folders on the cluster. You should //only//  clone, pull, and push in your own home or work directory. Do NOT skip this step. Before changing anything in a repository, read and abide to the conventions described in the main readme file.+  - Your code and documents should be stored in a Bitbucket repository like [[https://bitbucket.org/habilzare/genetwork|https://bitbucket.org/habilzare/genetwork]]. Sign up for an [[https://bitbucket.org/account/signup/|account]] and add your photo. Do NOT sign in using your Google account. Only then, send your username to Habil. If you are new to Bitbucket, spend an hour on the [[https://confluence.atlassian.com/bitbucket/tutorial-learn-bitbucket-with-git-759857287.html|tutorial]]. Taking [[https://guides.co/g/bitbucket-101/11146|Bitbucket 101]] is NOT needed. You can [[https://confluence.atlassian.com/bitbucket/use-the-ssh-protocol-with-bitbucket-cloud-221449711.html|avoid]] having to manually type a password each time you pull using ssh. To add a key, click on your photo at the top right corner of Bitbucket page, Bitbucket settings, SSH keys, Add key. This trick is not appropriate for TACC clusters because we should not change our .ssh folder there. On the cluster, use https to clone instead of ssh. Do NOT mess up with other's git folders on the cluster. You should //only//  clone, pull, and push in your own home or work directory. Do NOT skip this step. Before changing anything in a repository, read and abide to the conventions described in the main readme file.
   - Do NOT use space in the file or folder names. Do NOT include binary files such as png, pdf, RData, etc. in a Bitbucket repository unless on an exceptional basis. Instead, use e.g., ''rsync -avz -e ssh <usrname>@ls5.tacc.utexas.edu''  or ''scp ''to transfer files, and document the exact paths in a readme file in the corresponding folder.   - Do NOT use space in the file or folder names. Do NOT include binary files such as png, pdf, RData, etc. in a Bitbucket repository unless on an exceptional basis. Instead, use e.g., ''rsync -avz -e ssh <usrname>@ls5.tacc.utexas.edu''  or ''scp ''to transfer files, and document the exact paths in a readme file in the corresponding folder.
   - If you want to use TACC resources, you first [[https://portal.tacc.utexas.edu/account-request|create]] an account, and then ask Habil to add you to a project. A simple test for running a job on Stampede cluster is the following. Look at their user [[https://portal.tacc.utexas.edu/user-guides/stampede|guide]] or [[https://srcc.stanford.edu/sge-slurm-conversion|this]] table of commands for more details. \\  $ ssh <username>@stampede.tacc.utexas.edu \\  $ cd ~zare \\  login4.stampede(1)$ sbatch -p normal -n 1 -t 3 ./test.sh \\  We usually use Lonestar5 for computing and Ranch for storage of large data.   - If you want to use TACC resources, you first [[https://portal.tacc.utexas.edu/account-request|create]] an account, and then ask Habil to add you to a project. A simple test for running a job on Stampede cluster is the following. Look at their user [[https://portal.tacc.utexas.edu/user-guides/stampede|guide]] or [[https://srcc.stanford.edu/sge-slurm-conversion|this]] table of commands for more details. \\  $ ssh <username>@stampede.tacc.utexas.edu \\  $ cd ~zare \\  login4.stampede(1)$ sbatch -p normal -n 1 -t 3 ./test.sh \\  We usually use Lonestar5 for computing and Ranch for storage of large data.