Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
for_members [2024/05/14 14:37] azadfor_members [2024/06/19 17:00] – [General guidelines for conducting research in the Oncinfo Lab] admin
Line 23: Line 23:
   - All members should know about **central [[https://en.wikipedia.org/wiki/Central_dogma_of_molecular_biology|dogma]] of ****biology, **which is almost enough biological knowledge to start the majority of projects. Familiarity with some basic concepts such as [[https://en.wikipedia.org/wiki/Exon|exon]], intron, [[https://www.youtube.com/watch?v=CZeN-IgjYCo|sequencing]], etc. is helpful. Watch [[https://www.dnalc.org/resources/3d/|animations]] from DNA Learning Center.   - All members should know about **central [[https://en.wikipedia.org/wiki/Central_dogma_of_molecular_biology|dogma]] of ****biology, **which is almost enough biological knowledge to start the majority of projects. Familiarity with some basic concepts such as [[https://en.wikipedia.org/wiki/Exon|exon]], intron, [[https://www.youtube.com/watch?v=CZeN-IgjYCo|sequencing]], etc. is helpful. Watch [[https://www.dnalc.org/resources/3d/|animations]] from DNA Learning Center.
   - Any file or data on this wiki that has **restricted permissions**, such as some paper pdfs or drafts, should be kept confidential, and NOT be shared with nonmembers unless authorized by the PI.   - Any file or data on this wiki that has **restricted permissions**, such as some paper pdfs or drafts, should be kept confidential, and NOT be shared with nonmembers unless authorized by the PI.
-  - For future reference, please add the link to your presentations and drafts on the **[[https://oncinfo.org/drafts|drafts]]**page. At a minimum, please include: the author, the date, the audience or collaberator, and the subject.+  - For future reference, please add the link to your presentations and manuscript drafts on the **[[https://oncinfo.org/drafts|drafts]] **page. At a minimum, please include: the author, the date, the audience or collaberator, and the subject. Include the pipeline and experiment name in your presentation for future referene.
   - All members should read and follow **[[http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000424|Bill's]] guidelines**, and organize their files and folders accordingly and to some extent. Start by making a "~/proj" directory in your home folder that will eventually contain a subfolder for each project you are working on. Major subfolders must have a readme file for example to describe where the data is coming from. Abide to the "[[https://bitbucket.org/habilzare/template/src/master/|Rules]] for developing pipelines". In particular, your code folder must include a runall.R script that sources other scripts. Avoid sourcing scripts in other scripts except for the runall, functions, and libraries because then following and debugging the pipeline would be difficult. At the top of each individual script, please brielfy explain what it does, and sign it by writing <your name>, <YYYY-MM-DD>.   - All members should read and follow **[[http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000424|Bill's]] guidelines**, and organize their files and folders accordingly and to some extent. Start by making a "~/proj" directory in your home folder that will eventually contain a subfolder for each project you are working on. Major subfolders must have a readme file for example to describe where the data is coming from. Abide to the "[[https://bitbucket.org/habilzare/template/src/master/|Rules]] for developing pipelines". In particular, your code folder must include a runall.R script that sources other scripts. Avoid sourcing scripts in other scripts except for the runall, functions, and libraries because then following and debugging the pipeline would be difficult. At the top of each individual script, please brielfy explain what it does, and sign it by writing <your name>, <YYYY-MM-DD>.
   - Your code and documents should be stored in a **Bitbucket**  repository like [[https://bitbucket.org/habilzare/alzheimer/src/master/|https://bitbucket.org/habilzare/alzheimer/src/master/]]. Sign up for an [[https://bitbucket.org/account/signup/|account]] and add your photo. Do NOT sign in using your Google account. Only then, send your username and the corresponding email to Habil. If you are new to Bitbucket, spend an hour on the [[https://confluence.atlassian.com/bitbucket/tutorial-learn-bitbucket-with-git-759857287.html|tutorial]]. Taking [[https://guides.co/g/bitbucket-101/11146|Bitbucket 101]] is NOT needed for beginners. You can [[https://confluence.atlassian.com/bitbucket/use-the-ssh-protocol-with-bitbucket-cloud-221449711.html|avoid]] having to manually type a password each time you pull using ssh. To add a key, click on settings at the top right corner of Bitbucket page, SSH keys, Add key. Use ssh to clone a repository, NOT https. Do NOT mess up with other's git folders on the cluster. You should //only//  clone, pull, and push in your own home or work directory. Do NOT skip this step. Before changing anything in a repository, read and abide to the conventions described in the main readme file.   - Your code and documents should be stored in a **Bitbucket**  repository like [[https://bitbucket.org/habilzare/alzheimer/src/master/|https://bitbucket.org/habilzare/alzheimer/src/master/]]. Sign up for an [[https://bitbucket.org/account/signup/|account]] and add your photo. Do NOT sign in using your Google account. Only then, send your username and the corresponding email to Habil. If you are new to Bitbucket, spend an hour on the [[https://confluence.atlassian.com/bitbucket/tutorial-learn-bitbucket-with-git-759857287.html|tutorial]]. Taking [[https://guides.co/g/bitbucket-101/11146|Bitbucket 101]] is NOT needed for beginners. You can [[https://confluence.atlassian.com/bitbucket/use-the-ssh-protocol-with-bitbucket-cloud-221449711.html|avoid]] having to manually type a password each time you pull using ssh. To add a key, click on settings at the top right corner of Bitbucket page, SSH keys, Add key. Use ssh to clone a repository, NOT https. Do NOT mess up with other's git folders on the cluster. You should //only//  clone, pull, and push in your own home or work directory. Do NOT skip this step. Before changing anything in a repository, read and abide to the conventions described in the main readme file.
   - Do NOT use **space in the file or folder names**. Do NOT include binary files such as png, pdf, RData, etc. in a Bitbucket repository unless on an exceptional basis. Instead, use [[https://explainshell.com/explain?cmd=rsync+-avz|e.g.,]]''rsync -avz -e ssh <usrname>@ls6.tacc.utexas.edu''  or ''scp ''to transfer files between the cluster and your computer, and document the exact paths in a readme file in the corresponding folder. Add the readme file to the repository.   - Do NOT use **space in the file or folder names**. Do NOT include binary files such as png, pdf, RData, etc. in a Bitbucket repository unless on an exceptional basis. Instead, use [[https://explainshell.com/explain?cmd=rsync+-avz|e.g.,]]''rsync -avz -e ssh <usrname>@ls6.tacc.utexas.edu''  or ''scp ''to transfer files between the cluster and your computer, and document the exact paths in a readme file in the corresponding folder. Add the readme file to the repository.
-  - If you want to use **TACC**  resources, you first [[https://portal.tacc.utexas.edu/account-request|create]] an account, and then ask Habil to add you to a project. Source ''/work/03270/zare/Install/oncinfo_settings''  in your .bashrc or other bash scripts so that you do not need to install the software that we often need and are already installed by other lab members. We usually use Lonestar and Maverick for computing, and we archive large data on Ranch based on [[https://docs.google.com/document/d/17VkB7_HQUq7yeSr906Qlh7q8TlFX5nvP5a1F9csXBGY/edit|this]] protocol. A simple test for running a job on the Lonestar cluster is the following. Look at their user [[https://portal.tacc.utexas.edu/user-guides/lonestar5|guide]] and [[https://srcc.stanford.edu/sge-slurm-conversion|this]] table of commands for more details. \\ ''$ ssh <username>@ls6.tacc.utexas.edu \\  $ cd ~zare \\  login1.ls6(1099)$ cat ./test.mpi'' \\ ''login1.ls6(1099)$ sbatch ./test.mpi'' \\ You can monitor your jobs using ''squeue -u <usrname>''. The output will be saved in the ''~/temp''  subfolder. If there are multiple files in this folder, look at the newest one. \\ The above command will submit the job to the development queue. If you want to submit a job to the normal queue, you can do the following: \\ ''login1.ls6(1099)$ sbatch -p normal -n 1 -t 2 ./test.mpi'' \\ [[https://tacc-cloud.readthedocs.io/projects/agave/en/latest/index.html|Tapis]] is an //advanced//  optional patform for submitting jobs from your local computer, not recommended for beginners. Before submitting many jobs, estimate the time and memory by submitting a single job or preferably, running your code on toy data. If you think running a complete job takes more than 10 minutes on the cluster, before submitting more similar jobs, let Habil check your code to make sure we do not miss any easy [[https://bitbucket.org/habilzare/alzheimer/commits/250cacf92ef6b2886973dc609c229032b42b2234|parallelization]]. Familiarize yourself with [[https://docs.google.com/presentation/d/1TFjB16XmLk2Xo4qHJgGe1VfnIUonw9cIJmU1xCZxyL0/edit#slide=id.g242153ecf22_0_60|Docker]], which you can use to test your code locally before runing it on the cluster. [[https://bitbucket.org/habilzare/genetwork/src/master/code/docker/readme.txt|Use]]''habilzare/oncinfo:oncinfo-<version>''  and modify //only//''habilzare/oncinfo:oncinfo-dev-<version>''.+  - If you want to use **TACC**  resources, you first [[https://portal.tacc.utexas.edu/account-request|create]] an account, and then ask Habil to add you to a project. Source ''/work/03270/zare/Install/oncinfo_settings''  in your .bashrc or other bash scripts so that you do not need to install the software that we often need and are already installed by other lab members. We usually use Lonestar and Maverick for computing, and we archive large data on Ranch based on [[https://docs.google.com/document/d/17VkB7_HQUq7yeSr906Qlh7q8TlFX5nvP5a1F9csXBGY/edit|this]] protocol. A simple test for running a job on the Lonestar cluster is the following. Look at their user [[https://portal.tacc.utexas.edu/user-guides/lonestar5|guide]] and [[https://srcc.stanford.edu/sge-slurm-conversion|this]] table of commands for more details. \\ ''$ ssh <username>@ls6.tacc.utexas.edu \\  $ cd ~zare \\  login1.ls6(1099)$ cat ./test.mpi'' \\ ''login1.ls6(1099)$ sbatch ./test.mpi'' \\ You can monitor your jobs using ''squeue -u <usrname>''. The output will be saved in the ''~/temp''  subfolder. If there are multiple files in this folder, look at the newest one. \\ The above command will submit the job to the development queue. If you want to submit a job to the normal queue, you can do the following: \\ ''login1.ls6(1099)$ sbatch -p normal -n 1 -t 2 ./test.mpi'' \\ [[https://tacc-cloud.readthedocs.io/projects/agave/en/latest/index.html|Tapis]] is an //advanced//  optional patform for submitting jobs from your local computer, not recommended for beginners. Before submitting many jobs, estimate the time and memory by submitting a single job or preferably, running your code on toy data. If you think running a complete job takes more than 10 minutes on the cluster, before submitting more similar jobs, let Habil check your code to make sure we do not miss any easy [[https://bitbucket.org/habilzare/alzheimer/commits/250cacf92ef6b2886973dc609c229032b42b2234|parallelization]]. Familiarize yourself with [[https://docs.google.com/presentation/d/1TFjB16XmLk2Xo4qHJgGe1VfnIUonw9cIJmU1xCZxyL0/edit#slide=id.g242153ecf22_0_60|Docker]], which you can use to test your code locally before runing it on the cluster. [[https://bitbucket.org/habilzare/genetwork/src/master/code/docker/readme.txt|Use]]''habilzare/oncinfo:oncinfo-<version>''  and modify //only //''habilzare/oncinfo:oncinfo-dev-<version>''.
   - Every one should have a photo and their updated CV in pdf format on their personal page. {{:wiki:public:cv_template.zip|This}}  is an optional LaTeX template. The permission of any lab notebook (lano) should be set to "hidden"and it is important that they be updated EVERY day. [[https://www.dokuwiki.org/dokuwiki|DokuWiki]] provides us with two edit modes: ckg and DW. Use the one that is more convenient for you. Write your posts in anti-chronological order so that the newest post comes at the top. For facilitating future reference, avoid sending data as attachments. Instead, upload files to your lano and link to them where needed.   - Every one should have a photo and their updated CV in pdf format on their personal page. {{:wiki:public:cv_template.zip|This}}  is an optional LaTeX template. The permission of any lab notebook (lano) should be set to "hidden"and it is important that they be updated EVERY day. [[https://www.dokuwiki.org/dokuwiki|DokuWiki]] provides us with two edit modes: ckg and DW. Use the one that is more convenient for you. Write your posts in anti-chronological order so that the newest post comes at the top. For facilitating future reference, avoid sending data as attachments. Instead, upload files to your lano and link to them where needed.
   - You can install **Google Scholar [[https://chrome.google.com/webstore/detail/google-scholar-button/ldipcbpaocekfooobnbcddclnhejkcpn?hl=en|Button]]**add-on for an easier way of searching Google Scholar. You select the paper title and then click on the little blue icon on the top right corner. For any paper which you want to cite on the lab wiki, find it on Google Scholar, click on "More>Cite" and copy the MLA format. Also, use [[https://gsuite.google.com/marketplace/app/paperpile/894076725911|Paperpile]] for easy citation in Google doc, and Math [[https://gsuite.google.com/marketplace/app/math_equations/825973477142|Equations]] for writing and manipulating equations on Google presentations.   - You can install **Google Scholar [[https://chrome.google.com/webstore/detail/google-scholar-button/ldipcbpaocekfooobnbcddclnhejkcpn?hl=en|Button]]**add-on for an easier way of searching Google Scholar. You select the paper title and then click on the little blue icon on the top right corner. For any paper which you want to cite on the lab wiki, find it on Google Scholar, click on "More>Cite" and copy the MLA format. Also, use [[https://gsuite.google.com/marketplace/app/paperpile/894076725911|Paperpile]] for easy citation in Google doc, and Math [[https://gsuite.google.com/marketplace/app/math_equations/825973477142|Equations]] for writing and manipulating equations on Google presentations.
Line 44: Line 44:
   - If you are considering ultimately getting** jobs**  in computational biology or bioinformatics, have a look at postings at the [[http://bioinformatics.org|bioinformatics.org]] website __within the first week__  after joining the lab. For academic positions, see the Nature Jobs, Science Careers, and [[https://docs.google.com/document/d/1tRVpO0eFzYfl0f95X_X8YvE9DeQVX3vYBkmw4smRJLU/edit#|other]] websites. Read the articles on "[[http://oncinfo.org/how_to|How to]] rescue US biomedical research from its systemic flaws?" if you are, or want to be, a PhD student.   - If you are considering ultimately getting** jobs**  in computational biology or bioinformatics, have a look at postings at the [[http://bioinformatics.org|bioinformatics.org]] website __within the first week__  after joining the lab. For academic positions, see the Nature Jobs, Science Careers, and [[https://docs.google.com/document/d/1tRVpO0eFzYfl0f95X_X8YvE9DeQVX3vYBkmw4smRJLU/edit#|other]] websites. Read the articles on "[[http://oncinfo.org/how_to|How to]] rescue US biomedical research from its systemic flaws?" if you are, or want to be, a PhD student.
   - If you want to use **ROSMAP**  data, please create a Synapse [[https://www.synapse.org/#!RegisterAccount:0|account]], add your information to {{:second-ad-knowledge-portal-controlled-access-duc-march2022-v7.3-signed.pdf|this}}  file, and upload it again to Oncinfo without changing the file name. Let Habil know so that he uploads it on the Synapse [[https://help.adknowledgeportal.org/apd/Data-Use-Certificates.2623373330.html|website]]. Then, accept Terms of Use through this [[https://www.synapse.org/#!AccessRequirements:ID=syn2910256&TYPE=ENTITY|link]].   - If you want to use **ROSMAP**  data, please create a Synapse [[https://www.synapse.org/#!RegisterAccount:0|account]], add your information to {{:second-ad-knowledge-portal-controlled-access-duc-march2022-v7.3-signed.pdf|this}}  file, and upload it again to Oncinfo without changing the file name. Let Habil know so that he uploads it on the Synapse [[https://help.adknowledgeportal.org/apd/Data-Use-Certificates.2623373330.html|website]]. Then, accept Terms of Use through this [[https://www.synapse.org/#!AccessRequirements:ID=syn2910256&TYPE=ENTITY|link]].
 +
  
 ==== Some references ==== ==== Some references ====