Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
for_members [2024/02/28 20:28] – [General guidelines for conducting research in the Oncinfo Lab] habilfor_members [2024/04/16 16:07] – [Some references] admin
Line 27: Line 27:
   - Your code and documents should be stored in a **Bitbucket**  repository like [[https://bitbucket.org/habilzare/alzheimer/src/master/|https://bitbucket.org/habilzare/alzheimer/src/master/]]. Sign up for an [[https://bitbucket.org/account/signup/|account]] and add your photo. Do NOT sign in using your Google account. Only then, send your username and the corresponding email to Habil. If you are new to Bitbucket, spend an hour on the [[https://confluence.atlassian.com/bitbucket/tutorial-learn-bitbucket-with-git-759857287.html|tutorial]]. Taking [[https://guides.co/g/bitbucket-101/11146|Bitbucket 101]] is NOT needed for beginners. You can [[https://confluence.atlassian.com/bitbucket/use-the-ssh-protocol-with-bitbucket-cloud-221449711.html|avoid]] having to manually type a password each time you pull using ssh. To add a key, click on settings at the top right corner of Bitbucket page, SSH keys, Add key. Use ssh to clone a repository, NOT https. Do NOT mess up with other's git folders on the cluster. You should //only//  clone, pull, and push in your own home or work directory. Do NOT skip this step. Before changing anything in a repository, read and abide to the conventions described in the main readme file.   - Your code and documents should be stored in a **Bitbucket**  repository like [[https://bitbucket.org/habilzare/alzheimer/src/master/|https://bitbucket.org/habilzare/alzheimer/src/master/]]. Sign up for an [[https://bitbucket.org/account/signup/|account]] and add your photo. Do NOT sign in using your Google account. Only then, send your username and the corresponding email to Habil. If you are new to Bitbucket, spend an hour on the [[https://confluence.atlassian.com/bitbucket/tutorial-learn-bitbucket-with-git-759857287.html|tutorial]]. Taking [[https://guides.co/g/bitbucket-101/11146|Bitbucket 101]] is NOT needed for beginners. You can [[https://confluence.atlassian.com/bitbucket/use-the-ssh-protocol-with-bitbucket-cloud-221449711.html|avoid]] having to manually type a password each time you pull using ssh. To add a key, click on settings at the top right corner of Bitbucket page, SSH keys, Add key. Use ssh to clone a repository, NOT https. Do NOT mess up with other's git folders on the cluster. You should //only//  clone, pull, and push in your own home or work directory. Do NOT skip this step. Before changing anything in a repository, read and abide to the conventions described in the main readme file.
   - Do NOT use **space in the file or folder names**. Do NOT include binary files such as png, pdf, RData, etc. in a Bitbucket repository unless on an exceptional basis. Instead, use [[https://explainshell.com/explain?cmd=rsync+-avz|e.g.,]]''rsync -avz -e ssh <usrname>@ls6.tacc.utexas.edu''  or ''scp ''to transfer files between the cluster and your computer, and document the exact paths in a readme file in the corresponding folder. Add the readme file to the repository.   - Do NOT use **space in the file or folder names**. Do NOT include binary files such as png, pdf, RData, etc. in a Bitbucket repository unless on an exceptional basis. Instead, use [[https://explainshell.com/explain?cmd=rsync+-avz|e.g.,]]''rsync -avz -e ssh <usrname>@ls6.tacc.utexas.edu''  or ''scp ''to transfer files between the cluster and your computer, and document the exact paths in a readme file in the corresponding folder. Add the readme file to the repository.
-  - If you want to use **TACC**  resources, you first [[https://portal.tacc.utexas.edu/account-request|create]] an account, and then ask Habil to add you to a project. Source ''/work/03270/zare/Install/oncinfo_settings''  in your .bashrc or other bash scripts so that you do not need to install the software that we often need and are already installed by other lab members. We usually use Lonestar and Maverick for computing, and we archive large data on Ranch based on [[https://docs.google.com/document/d/17VkB7_HQUq7yeSr906Qlh7q8TlFX5nvP5a1F9csXBGY/edit|this]] protocol. A simple test for running a job on the Lonestar cluster is the following. Look at their user [[https://portal.tacc.utexas.edu/user-guides/lonestar5|guide]] and [[https://srcc.stanford.edu/sge-slurm-conversion|this]] table of commands for more details. \\ ''$ ssh <username>@ls6.tacc.utexas.edu \\  $ cd ~zare \\  login1.ls6(1099)$ cat ./test.mpi'' \\ ''login1.ls6(1099)$ sbatch ./test.mpi'' \\ You can monitor your jobs using ''squeue -u <usrname>''. The output will be saved in the ''~/temp''  subfolder. If there are multiple files in this folder, look at the newest one. \\ The above command will submit the job to the development queue. If you want to submit a job to the normal queue, you can do the following: \\ ''login1.ls6(1099)$ sbatch -p normal -n 1 -t 2 ./test.mpi'' \\ Before submitting many jobs, estimate the time and memory by submitting a single job or preferably, running your code on toy data. If you think running a complete job takes more than 10 minutes on the cluster, before submitting more similar jobs, let Habil check your code to make sure we do not miss any easy [[https://bitbucket.org/habilzare/alzheimer/commits/250cacf92ef6b2886973dc609c229032b42b2234|parallelization]]. Familiarize yourself with [[https://docs.google.com/presentation/d/1TFjB16XmLk2Xo4qHJgGe1VfnIUonw9cIJmU1xCZxyL0/edit#slide=id.g242153ecf22_0_60|Docker]]. Use ''habilzare/oncinfo:oncinfo-<version>''  and //only//  modify ''habilzare/oncinfo:oncinfo-dev-<version>''.+  - If you want to use **TACC**  resources, you first [[https://portal.tacc.utexas.edu/account-request|create]] an account, and then ask Habil to add you to a project. Source ''/work/03270/zare/Install/oncinfo_settings''  in your .bashrc or other bash scripts so that you do not need to install the software that we often need and are already installed by other lab members. We usually use Lonestar and Maverick for computing, and we archive large data on Ranch based on [[https://docs.google.com/document/d/17VkB7_HQUq7yeSr906Qlh7q8TlFX5nvP5a1F9csXBGY/edit|this]] protocol. A simple test for running a job on the Lonestar cluster is the following. Look at their user [[https://portal.tacc.utexas.edu/user-guides/lonestar5|guide]] and [[https://srcc.stanford.edu/sge-slurm-conversion|this]] table of commands for more details. \\ ''$ ssh <username>@ls6.tacc.utexas.edu \\  $ cd ~zare \\  login1.ls6(1099)$ cat ./test.mpi'' \\ ''login1.ls6(1099)$ sbatch ./test.mpi'' \\ You can monitor your jobs using ''squeue -u <usrname>''. The output will be saved in the ''~/temp''  subfolder. If there are multiple files in this folder, look at the newest one. \\ The above command will submit the job to the development queue. If you want to submit a job to the normal queue, you can do the following: \\ ''login1.ls6(1099)$ sbatch -p normal -n 1 -t 2 ./test.mpi'' \\ [[https://tacc-cloud.readthedocs.io/projects/agave/en/latest/index.html|Tapis]] is an //advanced//  optional patform for submitting jobs from your local computer, not recommended for beginners. Before submitting many jobs, estimate the time and memory by submitting a single job or preferably, running your code on toy data. If you think running a complete job takes more than 10 minutes on the cluster, before submitting more similar jobs, let Habil check your code to make sure we do not miss any easy [[https://bitbucket.org/habilzare/alzheimer/commits/250cacf92ef6b2886973dc609c229032b42b2234|parallelization]]. Familiarize yourself with [[https://docs.google.com/presentation/d/1TFjB16XmLk2Xo4qHJgGe1VfnIUonw9cIJmU1xCZxyL0/edit#slide=id.g242153ecf22_0_60|Docker]], which you can use to test your code locally before runing it on the cluster. Use ''habilzare/oncinfo:oncinfo-<version>''  and modify //only// ''habilzare/oncinfo:oncinfo-dev-<version>''.
   - Every one should have a photo and their updated CV in pdf format on their personal page. {{:wiki:public:cv_template.zip|This}}  is an optional LaTeX template. The permission of any lab notebook (lano) should be set to "hidden"and it is important that they be updated EVERY day. [[https://www.dokuwiki.org/dokuwiki|DokuWiki]] provides us with two edit modes: ckg and DW. Use the one that is more convenient for you. Write your posts in anti-chronological order so that the newest post comes at the top. For facilitating future reference, avoid sending data as attachments. Instead, upload files to your lano and link to them where needed.   - Every one should have a photo and their updated CV in pdf format on their personal page. {{:wiki:public:cv_template.zip|This}}  is an optional LaTeX template. The permission of any lab notebook (lano) should be set to "hidden"and it is important that they be updated EVERY day. [[https://www.dokuwiki.org/dokuwiki|DokuWiki]] provides us with two edit modes: ckg and DW. Use the one that is more convenient for you. Write your posts in anti-chronological order so that the newest post comes at the top. For facilitating future reference, avoid sending data as attachments. Instead, upload files to your lano and link to them where needed.
   - You can install **Google Scholar [[https://chrome.google.com/webstore/detail/google-scholar-button/ldipcbpaocekfooobnbcddclnhejkcpn?hl=en|Button]]**add-on for an easier way of searching Google Scholar. You select the paper title and then click on the little blue icon on the top right corner. For any paper which you want to cite on the lab wiki, find it on Google Scholar, click on "More>Cite" and copy the MLA format. Also, use [[https://gsuite.google.com/marketplace/app/paperpile/894076725911|Paperpile]] for easy citation in Google doc, and Math [[https://gsuite.google.com/marketplace/app/math_equations/825973477142|Equations]] for writing and manipulating equations on Google presentations.   - You can install **Google Scholar [[https://chrome.google.com/webstore/detail/google-scholar-button/ldipcbpaocekfooobnbcddclnhejkcpn?hl=en|Button]]**add-on for an easier way of searching Google Scholar. You select the paper title and then click on the little blue icon on the top right corner. For any paper which you want to cite on the lab wiki, find it on Google Scholar, click on "More>Cite" and copy the MLA format. Also, use [[https://gsuite.google.com/marketplace/app/paperpile/894076725911|Paperpile]] for easy citation in Google doc, and Math [[https://gsuite.google.com/marketplace/app/math_equations/825973477142|Equations]] for writing and manipulating equations on Google presentations.
Line 48: Line 48:
 ==== Some references ==== ==== Some references ====
  
-  - Two machine learning bibles, which summarize important topics in the field up to 2005: Bishop ({{:bishop-pattern_recongnition_and_machine_learning-1.pdf|1}},{{:bishop-pattern_recongnition_and_machine_learning-2.pdf|2}}  ) and [[http://statweb.stanford.edu/~tibs/ElemStatLearn/download.html|Hasite et al.]]. Anwar's cheat [[https://medium.com/swlh/cheat-sheets-for-machine-learning-interview-topics-51c2bc2bab4f|sheets]].+  - Two machine learning bibles, which summarize important topics in the field up to 2005: Bishop ({{:bishop-pattern_recongnition_and_machine_learning-1.pdf|1}},{{:bishop-pattern_recongnition_and_machine_learning-2.pdf|2}}  ) and [[http://statweb.stanford.edu/~tibs/ElemStatLearn/download.html|Hasite et al.]]. Anwar's cheat [[https://medium.com/swlh/cheat-sheets-for-machine-learning-interview-topics-51c2bc2bab4f|sheets]]. A [[https://docs.google.com/spreadsheets/d/1AK8lqS-ztMhh8YoOaQ7ScIZmabrQ5AFxAyXKwYWiT04/edit#gid=0|list]] of some of the best courses in ML.
   - An old list of computational biology [[https://docs.google.com/document/d/1fxqgQgsxf6Xd-8p0DUTyOGkeWNI5REW4zk1FXTxi2_I/edit|books]]. If you would like to have a hard copy of any these books or other books useful to your training and research, even in areas like psychology and management, please let Habil know. Molecular Biology of the Cell, by [[https://www.amazon.com/Molecular-Biology-Sixth-Bruce-Alberts/dp/0815345240|Alberts]] et al, is a good self-contained book starting from basic biology concepts like DNA and ending in describing complex pathways and mechanisms like immune system ({{:screen_shot_2021-03-29_at_1.05.18_pm.png?linkonly|contents}}, {{:molecular_biology_of_the_cell_6th_editio.pdf|pdf}}  ).   - An old list of computational biology [[https://docs.google.com/document/d/1fxqgQgsxf6Xd-8p0DUTyOGkeWNI5REW4zk1FXTxi2_I/edit|books]]. If you would like to have a hard copy of any these books or other books useful to your training and research, even in areas like psychology and management, please let Habil know. Molecular Biology of the Cell, by [[https://www.amazon.com/Molecular-Biology-Sixth-Bruce-Alberts/dp/0815345240|Alberts]] et al, is a good self-contained book starting from basic biology concepts like DNA and ending in describing complex pathways and mechanisms like immune system ({{:screen_shot_2021-03-29_at_1.05.18_pm.png?linkonly|contents}}, {{:molecular_biology_of_the_cell_6th_editio.pdf|pdf}}  ).
   - [[https://www.biostars.org/|Biostars]] is a good forum, similar to Stack Overflow in structure, but focused on bioinformatics and Computational Biology.   - [[https://www.biostars.org/|Biostars]] is a good forum, similar to Stack Overflow in structure, but focused on bioinformatics and Computational Biology.
Line 56: Line 56:
   - [[http://stephenturner.us/edu.html|Lists 1]] and [[https://www.r-bloggers.com/2013/04/list-of-bioinformatics-workshops-and-training-resources/|2]] of bioinformatics workshops.   - [[http://stephenturner.us/edu.html|Lists 1]] and [[https://www.r-bloggers.com/2013/04/list-of-bioinformatics-workshops-and-training-resources/|2]] of bioinformatics workshops.
   - A 5-minutes introduction to next-generation sequencing [[https://www.youtube.com/watch?annotation_id=annotation_228575861&feature=iv&src_vid=womKfikWlxM&v=fCd6B5HRaZ8|video]].   - A 5-minutes introduction to next-generation sequencing [[https://www.youtube.com/watch?annotation_id=annotation_228575861&feature=iv&src_vid=womKfikWlxM&v=fCd6B5HRaZ8|video]].
 +
  
 ==== Fun stuff ==== ==== Fun stuff ====