Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
for_members [2020/06/23 00:18] adminfor_members [2020/09/23 15:35] – [General guidelines for conducting research in the Oncinfo Lab] admin
Line 16: Line 16:
   - Your code and documents should be stored in a Bitbucket repository like [[https://bitbucket.org/habilzare/genetwork|https://bitbucket.org/habilzare/genetwork]]. Sign up for an [[https://bitbucket.org/account/signup/|account]] and add your photo. Do NOT sign in using your Google account. Only then, send your username to Habil. If you are new to Bitbucket, spend an hour on the [[https://confluence.atlassian.com/bitbucket/tutorial-learn-bitbucket-with-git-759857287.html|tutorial]]. Taking [[https://guides.co/g/bitbucket-101/11146|Bitbucket 101]] is NOT needed for beginners. You can [[https://confluence.atlassian.com/bitbucket/use-the-ssh-protocol-with-bitbucket-cloud-221449711.html|avoid]] having to manually type a password each time you pull using ssh. To add a key, click on your photo at the top right corner of Bitbucket page, Bitbucket settings, SSH keys, Add key. This trick is not appropriate for TACC clusters because we should not change our .ssh folder there. On the cluster, use https to clone instead of ssh. Do NOT mess up with other's git folders on the cluster. You should //only//  clone, pull, and push in your own home or work directory. Do NOT skip this step. Before changing anything in a repository, read and abide to the conventions described in the main readme file.   - Your code and documents should be stored in a Bitbucket repository like [[https://bitbucket.org/habilzare/genetwork|https://bitbucket.org/habilzare/genetwork]]. Sign up for an [[https://bitbucket.org/account/signup/|account]] and add your photo. Do NOT sign in using your Google account. Only then, send your username to Habil. If you are new to Bitbucket, spend an hour on the [[https://confluence.atlassian.com/bitbucket/tutorial-learn-bitbucket-with-git-759857287.html|tutorial]]. Taking [[https://guides.co/g/bitbucket-101/11146|Bitbucket 101]] is NOT needed for beginners. You can [[https://confluence.atlassian.com/bitbucket/use-the-ssh-protocol-with-bitbucket-cloud-221449711.html|avoid]] having to manually type a password each time you pull using ssh. To add a key, click on your photo at the top right corner of Bitbucket page, Bitbucket settings, SSH keys, Add key. This trick is not appropriate for TACC clusters because we should not change our .ssh folder there. On the cluster, use https to clone instead of ssh. Do NOT mess up with other's git folders on the cluster. You should //only//  clone, pull, and push in your own home or work directory. Do NOT skip this step. Before changing anything in a repository, read and abide to the conventions described in the main readme file.
   - Do NOT use space in the file or folder names. Do NOT include binary files such as png, pdf, RData, etc. in a Bitbucket repository unless on an exceptional basis. Instead, use [[https://explainshell.com/explain?cmd=rsync+-avz|e.g.,]]''rsync -avz -e ssh <usrname>@ls5.tacc.utexas.edu''  or ''scp ''to transfer files between the cluster and your computer, and document the exact paths in a readme file in the corresponding folder. Add the readme file to the repository.   - Do NOT use space in the file or folder names. Do NOT include binary files such as png, pdf, RData, etc. in a Bitbucket repository unless on an exceptional basis. Instead, use [[https://explainshell.com/explain?cmd=rsync+-avz|e.g.,]]''rsync -avz -e ssh <usrname>@ls5.tacc.utexas.edu''  or ''scp ''to transfer files between the cluster and your computer, and document the exact paths in a readme file in the corresponding folder. Add the readme file to the repository.
-  - If you want to use TACC resources, you first [[https://portal.tacc.utexas.edu/account-request|create]] an account, and then ask Habil to add you to a project. We usually use Lonestar5 for computing and Ranch for storage of large data. A simple test for running a job on the Lonestar cluster is the following. Look at their user [[https://portal.tacc.utexas.edu/user-guides/lonestar5|guide]] and [[https://srcc.stanford.edu/sge-slurm-conversion|this]] table of commands for more details. \\ ''$ ssh <username>@ls5.tacc.utexas.edu \\  $ cd ~zare \\  login1.ls5(1099)$ cat ./test.sh'' \\ ''login1.ls5(1099)$ sbatch ./test.sh'' \\ You can monitor your jobs using ''squeue -u <usrname>''. The output will be saved in the ''tests''  subfolder. If there are multiple files in this folder, look at the newest one. \\ The above command will submit the job to the development queue. If you want to submit a job to the normal queue, you can do the following: \\ ''login1.ls5(1099)$ sbatch -p normal -n 1 -t 2 ./test.sh'' +  - If you want to use TACC resources, you first [[https://portal.tacc.utexas.edu/account-request|create]] an account, and then ask Habil to add you to a project. We usually use Lonestar5 for computingand we archive large data on Ranch based on [[https://docs.google.com/document/d/17VkB7_HQUq7yeSr906Qlh7q8TlFX5nvP5a1F9csXBGY/edit|this]] protocol. A simple test for running a job on the Lonestar cluster is the following. Look at their user [[https://portal.tacc.utexas.edu/user-guides/lonestar5|guide]] and [[https://srcc.stanford.edu/sge-slurm-conversion|this]] table of commands for more details. \\ ''$ ssh <username>@ls5.tacc.utexas.edu \\  $ cd ~zare \\  login1.ls5(1099)$ cat ./test.sh'' \\ ''login1.ls5(1099)$ sbatch ./test.sh'' \\ You can monitor your jobs using ''squeue -u <usrname>''. The output will be saved in the ''tests''  subfolder. If there are multiple files in this folder, look at the newest one. \\ The above command will submit the job to the development queue. If you want to submit a job to the normal queue, you can do the following: \\ ''login1.ls5(1099)$ sbatch -p normal -n 1 -t 2 ./test.sh'' 
-  - Every member should upload their photo to his profile in the wiki. Todo this, click on your username at the top right, then, Account. In addition, everyone should have a photo and their updated CV in pdf format on their personal page. [[:file_view_cv_template.zip_543305154_cv_template.zip|This]] is an optional LaTeX template. The permission of any lab notebook (lano) should be set to "hidden"and it is important that they be updated EVERY day. [[https://civihosting.com/|CiviHosting]] provides us with two edit modes: ckg and DW. Use the one that is more convenient for you. Write your posts in anti-chronological order so that the newest post comes at the top. For facilitating future reference, avoid sending data as attachments. Instead, upload files to your lano and link to them where needed.+  - Every member should upload their photo to his profile in the wiki. Todo this, click on your username at the top right, then, Account. In addition, everyone should have a photo and their updated CV in pdf format on their personal page. {{:cv_template.zip|This}}  is an optional LaTeX template. The permission of any lab notebook (lano) should be set to "hidden"and it is important that they be updated EVERY day. [[https://civihosting.com/|CiviHosting]] provides us with two edit modes: ckg and DW. Use the one that is more convenient for you. Write your posts in anti-chronological order so that the newest post comes at the top. For facilitating future reference, avoid sending data as attachments. Instead, upload files to your lano and link to them where needed.
   - You can install Google Scholar [[https://chrome.google.com/webstore/detail/google-scholar-button/ldipcbpaocekfooobnbcddclnhejkcpn?hl=en|Button]] add-on for an easier way of searching Google Scholar. You select the paper title and then click on the little blue icon on the top right corner. For any paper which you want to cite on the lab wiki, find it on Google Scholar, click on "More>Cite" and copy the MLA format. Also, use [[https://gsuite.google.com/marketplace/app/paperpile/894076725911|Paperpile]] for easy citation in Google doc, and Math [[https://gsuite.google.com/marketplace/app/math_equations/825973477142|Equations]] for writing and manipulating equations on Google presentations.   - You can install Google Scholar [[https://chrome.google.com/webstore/detail/google-scholar-button/ldipcbpaocekfooobnbcddclnhejkcpn?hl=en|Button]] add-on for an easier way of searching Google Scholar. You select the paper title and then click on the little blue icon on the top right corner. For any paper which you want to cite on the lab wiki, find it on Google Scholar, click on "More>Cite" and copy the MLA format. Also, use [[https://gsuite.google.com/marketplace/app/paperpile/894076725911|Paperpile]] for easy citation in Google doc, and Math [[https://gsuite.google.com/marketplace/app/math_equations/825973477142|Equations]] for writing and manipulating equations on Google presentations.
   - Create a Nature [[https://idp.nature.com/register/natureuser?redirect_uri=https://www.nature.com/my-account/alerts|account]] for yourself. To get a monthly list of published papers in Nature Methods, subscribe to the corresponding alert . This can help you get a sense of where the field is going. You can also create an [[https://scholar.google.com/intl/en/scholar/help.html#alerts|alert]] on the Google Scholar to get regular updates on what is being published on the specific topic of your study.   - Create a Nature [[https://idp.nature.com/register/natureuser?redirect_uri=https://www.nature.com/my-account/alerts|account]] for yourself. To get a monthly list of published papers in Nature Methods, subscribe to the corresponding alert . This can help you get a sense of where the field is going. You can also create an [[https://scholar.google.com/intl/en/scholar/help.html#alerts|alert]] on the Google Scholar to get regular updates on what is being published on the specific topic of your study.
Line 25: Line 25:
   - When possible, give and use column and row names to the matrixes. Also, give and use names for vectors.   - When possible, give and use column and row names to the matrixes. Also, give and use names for vectors.
   - Do’s and don’ts when s[[http://www.sicb.org/students/Dos_and_Donts.pdf|ubmitting]] papers.   - Do’s and don’ts when s[[http://www.sicb.org/students/Dos_and_Donts.pdf|ubmitting]] papers.
-  - Make sure that your home directory and also your work directory on the cluster are at least readable to the group. E.g., In your .bashrc, set umask 007 and do the following: \\ ''chmod -R g+rwX ~ ; cdw; chmod -R g+rwX; cds; chmod -R g+rwX \\ chmod 600 ~/.ssh/id_rsa'' \\ There might be some exceptions like your private ssh key at ''~/.ssh/id_rsa''  which must be readable only by you.+  - Make sure that your home directory and also your work directory on the cluster are at least readable to the group. E.g., In your .bashrc, set umask 007 and do the following: \\ ''chmod -R g+rwX ~ ; cdw; chmod -R g+rwX; cds; chmod -R g+rwX \\ chmod 600 ~/.ssh/id_rsa'' \\ There might be some exceptions like your private ssh key at ''~/.ssh/id_rsa''  which must be readable only by you. You need to do this on ALL clusters including Lonestar and Ranch. On Ranch, adding umask in .bashrc. does not work. Instead, create or modify ~/.cshrc.
   - If you are unfamiliar with prior, posterior, and likelihood, read about [[http://%5Bhttp://en.wikipedia.org/wiki/Bayesian_inference|Bayesian inference]].   - If you are unfamiliar with prior, posterior, and likelihood, read about [[http://%5Bhttp://en.wikipedia.org/wiki/Bayesian_inference|Bayesian inference]].
   - To use ref.bib bibliography in bibtex, do the following: \\ a) cd proj \\ b) git clone [[mailto:git@bitbucket.org:habilzare/refs.git|git@bitbucket.org:habilzare/refs.git]] \\  c) At the bottom of your LaTeX document, write: \\  \bibliography{\detokenize{~/proj/refs/refs}} \\  d) To add a new entry, find the appropriate format using "Google Scholar Button" (see above, click on the quotation mark at the to right, and then BibTeX at the bottom) copy the entry and see if it is already in the refs.bib file. If not, add it in "**its right location"**  (i.e., key are alphabetically ordered) and push. Use the key with the \cite command in your LaTeX file. To compile, use pdflatex, bibtex (without .tex), and pdflatex *2.   - To use ref.bib bibliography in bibtex, do the following: \\ a) cd proj \\ b) git clone [[mailto:git@bitbucket.org:habilzare/refs.git|git@bitbucket.org:habilzare/refs.git]] \\  c) At the bottom of your LaTeX document, write: \\  \bibliography{\detokenize{~/proj/refs/refs}} \\  d) To add a new entry, find the appropriate format using "Google Scholar Button" (see above, click on the quotation mark at the to right, and then BibTeX at the bottom) copy the entry and see if it is already in the refs.bib file. If not, add it in "**its right location"**  (i.e., key are alphabetically ordered) and push. Use the key with the \cite command in your LaTeX file. To compile, use pdflatex, bibtex (without .tex), and pdflatex *2.
   - Please cc Habil on any email that is related to scientific or logistic aspects of your research in the lab, your career development activities, and communications among lab members on issues related to the lab. When you send an email to multiple people, mention the primary addressee at the top. It helps drawing the attention of the addressee, and also shows your respect to others who do not need to read your whole message. Usually using "reply-to-all" is preferred on emails with multiple recipients. When possible, reply to the previous email on a topic and avoid creating another thread unnecessarily, which will complicate future references. Emails should be receive some sort of reply within 24 hours even if it is short like "I'll work on it". Otherwise, you will start your next email with "Sorry for the delay".   - Please cc Habil on any email that is related to scientific or logistic aspects of your research in the lab, your career development activities, and communications among lab members on issues related to the lab. When you send an email to multiple people, mention the primary addressee at the top. It helps drawing the attention of the addressee, and also shows your respect to others who do not need to read your whole message. Usually using "reply-to-all" is preferred on emails with multiple recipients. When possible, reply to the previous email on a topic and avoid creating another thread unnecessarily, which will complicate future references. Emails should be receive some sort of reply within 24 hours even if it is short like "I'll work on it". Otherwise, you will start your next email with "Sorry for the delay".
   - As employees of UT Health, we can get facilitated appointments with UT Health primary care physicians (call: 210-450-9090).   - As employees of UT Health, we can get facilitated appointments with UT Health primary care physicians (call: 210-450-9090).
-  - If you are considering ultimately getting jobs in computation biology or bioinformatics, have a look at postings at the [[http://bioinformatics.org|bioinformatics.org]] website __within the first week__  after joining the lab. For academic positions, see the Nature Jobs and Science Careers websites. Read the articles on "[[http://oncinfo.org/how_to|How to]] rescue US biomedical research from its systemic flaws?" if you are, or want to be, a PhD student.+  - If you are considering ultimately getting jobs in computational biology or bioinformatics, have a look at postings at the [[http://bioinformatics.org|bioinformatics.org]] website __within the first week__  after joining the lab. For academic positions, see the Nature Jobs and Science Careers websites. Read the articles on "[[http://oncinfo.org/how_to|How to]] rescue US biomedical research from its systemic flaws?" if you are, or want to be, a PhD student.
   - If you want to use ROSMAP data, please create a Synapse [[https://www.synapse.org/#!RegisterAccount:0|account]], add your information to {{:dua_sagebionetworks_zare_ros-map_fe.pdf|this}}  file, and upload it again to Oncinfo without changing the file name. Then, let Habil know so that he uploads it on the Synapse [[https://www.synapse.org/#!AccessRequirements:ID=syn3219045&TYPE=ENTITY|website]].   - If you want to use ROSMAP data, please create a Synapse [[https://www.synapse.org/#!RegisterAccount:0|account]], add your information to {{:dua_sagebionetworks_zare_ros-map_fe.pdf|this}}  file, and upload it again to Oncinfo without changing the file name. Then, let Habil know so that he uploads it on the Synapse [[https://www.synapse.org/#!AccessRequirements:ID=syn3219045&TYPE=ENTITY|website]].
 +
  
 ==== Some references ==== ==== Some references ====
  
-  - Two machine learning bibles: Bishop ({{:bishop-pattern_recongnition_and_machine_learning-1.pdf|1}},{{:bishop-pattern_recongnition_and_machine_learning-2.pdf|2}}  ) and [[http://statweb.stanford.edu/~tibs/ElemStatLearn/download.html|Hasite et al.]].+  - Two machine learning bibles, which summarize important topics in the field up to 2005: Bishop ({{:bishop-pattern_recongnition_and_machine_learning-1.pdf|1}},{{:bishop-pattern_recongnition_and_machine_learning-2.pdf|2}}  ) and [[http://statweb.stanford.edu/~tibs/ElemStatLearn/download.html|Hasite et al.]].
   - [[https://www.biostars.org/|Biostars]] is a good forum, similar to Stack Overflow in structure, but focused on bioinformatics and Computational Biology.   - [[https://www.biostars.org/|Biostars]] is a good forum, similar to Stack Overflow in structure, but focused on bioinformatics and Computational Biology.
   - [[http://www.wikiwand.com/en/List_of_free_online_bioinformatics_courses|List]] of free online bioinformatics courses and some interesting [[http://bioconductor.org/help/events/|events]] in Bioconductor.   - [[http://www.wikiwand.com/en/List_of_free_online_bioinformatics_courses|List]] of free online bioinformatics courses and some interesting [[http://bioconductor.org/help/events/|events]] in Bioconductor.
Line 42: Line 43:
   - [[http://stephenturner.us/edu.html|List]] of bioinformatics workshops.   - [[http://stephenturner.us/edu.html|List]] of bioinformatics workshops.
   - A 5-minutes introduction to next-generation sequencing [[https://www.youtube.com/watch?annotation_id=annotation_228575861&feature=iv&src_vid=womKfikWlxM&v=fCd6B5HRaZ8|video]].   - A 5-minutes introduction to next-generation sequencing [[https://www.youtube.com/watch?annotation_id=annotation_228575861&feature=iv&src_vid=womKfikWlxM&v=fCd6B5HRaZ8|video]].
 +
  
 ==== Fun stuff ==== ==== Fun stuff ====