Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
for_members [2024/06/19 17:05] – [General guidelines for conducting research in the Oncinfo Lab] adminfor_members [2024/06/28 17:04] (current) – [General guidelines for conducting research in the Oncinfo Lab] admin
Line 13: Line 13:
  
   - All **Google docs**  that need to be edited by lab members should be put in Oncinfo [[https://drive.google.com/?tab=mo&authuser=0#folders/0B5Cpru0UXP0adTZTckg3aEd4SEE|folder]]. They should be kept confidential. Send your gmail address to Habil to get access to this folder. Remind him to add you to his Oncinfo Google folder. Then, create a subfolder with your name there, and create a google doc in your subfolder. Copy all items from this "For members" page to that google doc, and write "**Done**", "**Todo**", "**Skip**" in front of each item ([[https://docs.google.com/document/d/1o8s5_S-Lx-xXl1MVHV6MN7-bZrjitoKctm9CxRAdXu4/edit#heading=h.2l9mzouq3lij|example]]).   - All **Google docs**  that need to be edited by lab members should be put in Oncinfo [[https://drive.google.com/?tab=mo&authuser=0#folders/0B5Cpru0UXP0adTZTckg3aEd4SEE|folder]]. They should be kept confidential. Send your gmail address to Habil to get access to this folder. Remind him to add you to his Oncinfo Google folder. Then, create a subfolder with your name there, and create a google doc in your subfolder. Copy all items from this "For members" page to that google doc, and write "**Done**", "**Todo**", "**Skip**" in front of each item ([[https://docs.google.com/document/d/1o8s5_S-Lx-xXl1MVHV6MN7-bZrjitoKctm9CxRAdXu4/edit#heading=h.2l9mzouq3lij|example]]).
-  - If you have a lab **computer**, add the tag number written on the back of the laptop, your name, and the date you start using it in the [[https://docs.google.com/spreadsheets/d/1A6ouCCPov5VXt7xBCdTh7Cwc6jtPJGGJlekmGnKL5JY/edit#gid=441648294|table]] of computers. Your drive must be always encrypted. E.g., on macOS, you can make sure FileValut is on like {{:screen_shot_2020-02-15_at_10.24.09_pm.png?linkonly|this}}. Take a similar screenshot that shows your name at the upper right corner, and post it on your lab notebook. Mackbooks sometimes use their storage as RAM, so leave at least 50 GB empty all the time.+  - If you have a lab **computer**, add the tag number written on the back of the laptop, your name, and the date you start using it in the [[https://docs.google.com/spreadsheets/d/1A6ouCCPov5VXt7xBCdTh7Cwc6jtPJGGJlekmGnKL5JY/edit#gid=441648294|table]] of computers. Your drive must be always encrypted. E.g., on macOS, you can make sure FileValut is on like {{:screen_shot_2020-02-15_at_10.24.09_pm.png?linkonly|this}}. Take a similar screenshot that shows your name at the upper right corner, and post it on your lab notebook. Mackbooks sometimes use their storage as RAM, so leave at least 50-100 GB empty all the time.
   - Pass the online training **courses**  required by the University e.g., conflict of interest, safety, etc.   - Pass the online training **courses**  required by the University e.g., conflict of interest, safety, etc.
   - Ask for help [[https://codingkilledthecat.wordpress.com/2012/06/26/how-to-ask-for-programming-help/|professionally]], e.g., include the exact text of the error message.   - Ask for help [[https://codingkilledthecat.wordpress.com/2012/06/26/how-to-ask-for-programming-help/|professionally]], e.g., include the exact text of the error message.
Line 23: Line 23:
   - All members should know about **central [[https://en.wikipedia.org/wiki/Central_dogma_of_molecular_biology|dogma]] of ****biology, **which is almost enough biological knowledge to start the majority of projects. Familiarity with some basic concepts such as [[https://en.wikipedia.org/wiki/Exon|exon]], intron, [[https://www.youtube.com/watch?v=CZeN-IgjYCo|sequencing]], etc. is helpful. Watch [[https://www.dnalc.org/resources/3d/|animations]] from DNA Learning Center.   - All members should know about **central [[https://en.wikipedia.org/wiki/Central_dogma_of_molecular_biology|dogma]] of ****biology, **which is almost enough biological knowledge to start the majority of projects. Familiarity with some basic concepts such as [[https://en.wikipedia.org/wiki/Exon|exon]], intron, [[https://www.youtube.com/watch?v=CZeN-IgjYCo|sequencing]], etc. is helpful. Watch [[https://www.dnalc.org/resources/3d/|animations]] from DNA Learning Center.
   - Any file or data on this wiki that has **restricted permissions**, such as some paper pdfs or drafts, should be kept confidential, and NOT be shared with nonmembers unless authorized by the PI.   - Any file or data on this wiki that has **restricted permissions**, such as some paper pdfs or drafts, should be kept confidential, and NOT be shared with nonmembers unless authorized by the PI.
-  - For future reference, please add the link to your presentations and manuscript drafts on the **[[https://oncinfo.org/drafts|drafts]]**page. At a minimum, please include: the author, the date, the audience or collaborator, and the subject. Include the pipeline and experiment name in your presentation for future reference.+  - For future reference, please add the link to your presentations and manuscript drafts on the **[[https://oncinfo.org/drafts|drafts]]**page. At a minimum, please include: the author, the date, the audience or collaborator, and the subject. Include the corresponding pipeline and experiment name in your presentation for future reference.
   - All members should read and follow **[[http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000424|Bill's]] guidelines**, and organize their files and folders accordingly and to some extent. Start by making a "~/proj" directory in your home folder that will eventually contain a subfolder for each project you are working on. Major subfolders must have a readme file for example to describe where the data is coming from. Abide to the "[[https://bitbucket.org/habilzare/template/src/master/|Rules]] for developing pipelines". In particular, your code folder must include a runall.R script that sources other scripts. Avoid sourcing scripts in other scripts except for the runall, functions, and libraries because then following and debugging the pipeline would be difficult. At the top of each individual script, please brielfy explain what it does, and sign it by writing <your name>, <YYYY-MM-DD>.   - All members should read and follow **[[http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000424|Bill's]] guidelines**, and organize their files and folders accordingly and to some extent. Start by making a "~/proj" directory in your home folder that will eventually contain a subfolder for each project you are working on. Major subfolders must have a readme file for example to describe where the data is coming from. Abide to the "[[https://bitbucket.org/habilzare/template/src/master/|Rules]] for developing pipelines". In particular, your code folder must include a runall.R script that sources other scripts. Avoid sourcing scripts in other scripts except for the runall, functions, and libraries because then following and debugging the pipeline would be difficult. At the top of each individual script, please brielfy explain what it does, and sign it by writing <your name>, <YYYY-MM-DD>.
-  - Your code and documents should be stored in a **Bitbucket**  repository like [[https://bitbucket.org/habilzare/alzheimer/src/master/|https://bitbucket.org/habilzare/alzheimer/src/master/]]. Sign up for an [[https://bitbucket.org/account/signup/|account]] and add your photo. Do NOT sign in using your Google account. Only then, send your username and the corresponding email to Habil. If you are new to Bitbucket, spend an hour on the [[https://confluence.atlassian.com/bitbucket/tutorial-learn-bitbucket-with-git-759857287.html|tutorial]]. Taking [[https://guides.co/g/bitbucket-101/11146|Bitbucket 101]] is NOT needed for beginners. You can [[https://confluence.atlassian.com/bitbucket/use-the-ssh-protocol-with-bitbucket-cloud-221449711.html|avoid]] having to manually type a password each time you pull using ssh. To add a key, click on settings at the top right corner of Bitbucket page, SSH keys, Add key. Use ssh to clone a repository, NOT https. Do NOT mess up with other's git folders on the cluster. You should //only//  clone, pull, and push in your own home or work directory. Do NOT skip this step. Before changing anything in a repository, read and abide to the conventions described in the main readme file.+  - Your code and documents should be stored in a **Bitbucket**  repository like [[https://bitbucket.org/habilzare/alzheimer/src/master/|https://bitbucket.org/habilzare/alzheimer/src/master/]]. Sign up for an [[https://bitbucket.org/account/signup/|account]] and add your photo. Do {{:wiki:public:bitbucket_account.png?linkonly|NOT}}  sign in using your Google account. Only then, send your username and the corresponding email to Habil. If you are new to Bitbucket, spend an hour on the [[https://confluence.atlassian.com/bitbucket/tutorial-learn-bitbucket-with-git-759857287.html|tutorial]]. Taking [[https://guides.co/g/bitbucket-101/11146|Bitbucket 101]] is NOT needed for beginners. You can [[https://confluence.atlassian.com/bitbucket/use-the-ssh-protocol-with-bitbucket-cloud-221449711.html|avoid]] having to manually type a password each time you pull using ssh. To add a key, click on settings at the top right corner of Bitbucket page, SSH keys, Add key. Use ssh to clone a repository, NOT https. Do NOT mess up with other's git folders on the cluster. You should //only//  clone, pull, and push in your own home or work directory. Do NOT skip this step. Before changing anything in a repository, read and abide to the conventions described in the main readme file.
   - Do NOT use **space in the file or folder names**. Do NOT include binary files such as png, pdf, RData, etc. in a Bitbucket repository unless on an exceptional basis. Instead, use [[https://explainshell.com/explain?cmd=rsync+-avz|e.g.,]]''rsync -avz -e ssh <usrname>@ls6.tacc.utexas.edu''  or ''scp ''to transfer files between the cluster and your computer, and document the exact paths in a readme file in the corresponding folder. Add the readme file to the repository.   - Do NOT use **space in the file or folder names**. Do NOT include binary files such as png, pdf, RData, etc. in a Bitbucket repository unless on an exceptional basis. Instead, use [[https://explainshell.com/explain?cmd=rsync+-avz|e.g.,]]''rsync -avz -e ssh <usrname>@ls6.tacc.utexas.edu''  or ''scp ''to transfer files between the cluster and your computer, and document the exact paths in a readme file in the corresponding folder. Add the readme file to the repository.
   - If you want to use **TACC**  resources, you first [[https://portal.tacc.utexas.edu/account-request|create]] an account, and then ask Habil to add you to a project. Source ''/work/03270/zare/Install/oncinfo_settings''  in your .bashrc or other bash scripts so that you do not need to install the software that we often need and are already installed by other lab members. We usually use Lonestar and Maverick for computing, and we archive large data on Ranch based on [[https://docs.google.com/document/d/17VkB7_HQUq7yeSr906Qlh7q8TlFX5nvP5a1F9csXBGY/edit|this]] protocol. A simple test for running a job on the Lonestar cluster is the following. Look at their user [[https://portal.tacc.utexas.edu/user-guides/lonestar5|guide]] and [[https://srcc.stanford.edu/sge-slurm-conversion|this]] table of commands for more details. \\ ''$ ssh <username>@ls6.tacc.utexas.edu \\  $ cd ~zare \\  login1.ls6(1099)$ cat ./test.mpi'' \\ ''login1.ls6(1099)$ sbatch ./test.mpi'' \\ You can monitor your jobs using ''squeue -u <usrname>''. The output will be saved in the ''~/temp''  subfolder. If there are multiple files in this folder, look at the newest one. \\ The above command will submit the job to the development queue. If you want to submit a job to the normal queue, you can do the following: \\ ''login1.ls6(1099)$ sbatch -p normal -n 1 -t 2 ./test.mpi'' \\ [[https://tacc-cloud.readthedocs.io/projects/agave/en/latest/index.html|Tapis]] is an //advanced//  optional patform for submitting jobs from your local computer, not recommended for beginners. Before submitting many jobs, estimate the time and memory by submitting a single job or preferably, running your code on toy data. If you think running a complete job takes more than 10 minutes on the cluster, before submitting more similar jobs, let Habil check your code to make sure we do not miss any easy [[https://bitbucket.org/habilzare/alzheimer/commits/250cacf92ef6b2886973dc609c229032b42b2234|parallelization]]. Familiarize yourself with [[https://docs.google.com/presentation/d/1TFjB16XmLk2Xo4qHJgGe1VfnIUonw9cIJmU1xCZxyL0/edit#slide=id.g242153ecf22_0_60|Docker]], which you can use to test your code locally before runing it on the cluster. [[https://bitbucket.org/habilzare/genetwork/src/master/code/docker/readme.txt|Use]]''habilzare/oncinfo:oncinfo-<version>''  and modify //only //''habilzare/oncinfo:oncinfo-dev-<version>''.   - If you want to use **TACC**  resources, you first [[https://portal.tacc.utexas.edu/account-request|create]] an account, and then ask Habil to add you to a project. Source ''/work/03270/zare/Install/oncinfo_settings''  in your .bashrc or other bash scripts so that you do not need to install the software that we often need and are already installed by other lab members. We usually use Lonestar and Maverick for computing, and we archive large data on Ranch based on [[https://docs.google.com/document/d/17VkB7_HQUq7yeSr906Qlh7q8TlFX5nvP5a1F9csXBGY/edit|this]] protocol. A simple test for running a job on the Lonestar cluster is the following. Look at their user [[https://portal.tacc.utexas.edu/user-guides/lonestar5|guide]] and [[https://srcc.stanford.edu/sge-slurm-conversion|this]] table of commands for more details. \\ ''$ ssh <username>@ls6.tacc.utexas.edu \\  $ cd ~zare \\  login1.ls6(1099)$ cat ./test.mpi'' \\ ''login1.ls6(1099)$ sbatch ./test.mpi'' \\ You can monitor your jobs using ''squeue -u <usrname>''. The output will be saved in the ''~/temp''  subfolder. If there are multiple files in this folder, look at the newest one. \\ The above command will submit the job to the development queue. If you want to submit a job to the normal queue, you can do the following: \\ ''login1.ls6(1099)$ sbatch -p normal -n 1 -t 2 ./test.mpi'' \\ [[https://tacc-cloud.readthedocs.io/projects/agave/en/latest/index.html|Tapis]] is an //advanced//  optional patform for submitting jobs from your local computer, not recommended for beginners. Before submitting many jobs, estimate the time and memory by submitting a single job or preferably, running your code on toy data. If you think running a complete job takes more than 10 minutes on the cluster, before submitting more similar jobs, let Habil check your code to make sure we do not miss any easy [[https://bitbucket.org/habilzare/alzheimer/commits/250cacf92ef6b2886973dc609c229032b42b2234|parallelization]]. Familiarize yourself with [[https://docs.google.com/presentation/d/1TFjB16XmLk2Xo4qHJgGe1VfnIUonw9cIJmU1xCZxyL0/edit#slide=id.g242153ecf22_0_60|Docker]], which you can use to test your code locally before runing it on the cluster. [[https://bitbucket.org/habilzare/genetwork/src/master/code/docker/readme.txt|Use]]''habilzare/oncinfo:oncinfo-<version>''  and modify //only //''habilzare/oncinfo:oncinfo-dev-<version>''.