Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
for_members [2024/08/14 17:10] – [General guidelines for conducting research in the Oncinfo Lab] habilfor_members [2024/09/20 16:01] (current) – [General guidelines for conducting research in the Oncinfo Lab] habil
Line 29: Line 29:
   - If you want to use **TACC**  resources, you first [[https://portal.tacc.utexas.edu/account-request|create]] an account, and then ask Habil to add you to a project. Source ''/work/03270/zare/Install/oncinfo_settings''  in your .bashrc or other bash scripts so that you do not need to install the software that we often need and are already installed by other lab members. We usually use Lonestar and Maverick for computing, and we archive large data on Ranch based on [[https://docs.google.com/document/d/17VkB7_HQUq7yeSr906Qlh7q8TlFX5nvP5a1F9csXBGY/edit|this]] protocol. A simple test for running a job on the Lonestar cluster is the following. Look at their user [[https://portal.tacc.utexas.edu/user-guides/lonestar5|guide]] and [[https://srcc.stanford.edu/sge-slurm-conversion|this]] table of commands for more details. \\ ''$ ssh <username>@ls6.tacc.utexas.edu \\  $ cd ~zare \\  login1.ls6(1099)$ cat ./test.mpi'' \\ ''login1.ls6(1099)$ sbatch ./test.mpi'' \\ You can monitor your jobs using ''squeue -u <usrname>''. The output will be saved in the ''~/temp''  subfolder. If there are multiple files in this folder, look at the newest one. \\ The above command will submit the job to the development queue. If you want to submit a job to the normal queue, you can do the following: \\ ''login1.ls6(1099)$ sbatch -p normal -n 1 -t 2 ./test.mpi'' \\ [[https://tacc-cloud.readthedocs.io/projects/agave/en/latest/index.html|Tapis]] is an //advanced//  optional patform for submitting jobs from your local computer, not recommended for beginners. Before submitting many jobs, estimate the time and memory by submitting a single job or preferably, running your code on toy data. If you think running a complete job takes more than 10 minutes on the cluster, before submitting more similar jobs, let Habil check your code to make sure we do not miss any easy [[https://bitbucket.org/habilzare/alzheimer/commits/250cacf92ef6b2886973dc609c229032b42b2234|parallelization]]. Familiarize yourself with [[https://docs.google.com/presentation/d/1TFjB16XmLk2Xo4qHJgGe1VfnIUonw9cIJmU1xCZxyL0/edit#slide=id.g242153ecf22_0_60|Docker]], which you can use to test your code locally before runing it on the cluster. [[https://bitbucket.org/habilzare/genetwork/src/master/code/docker/readme.txt|Use]]''habilzare/oncinfo:oncinfo-<version>''  and modify //only //''habilzare/oncinfo:oncinfo-dev-<version>''.   - If you want to use **TACC**  resources, you first [[https://portal.tacc.utexas.edu/account-request|create]] an account, and then ask Habil to add you to a project. Source ''/work/03270/zare/Install/oncinfo_settings''  in your .bashrc or other bash scripts so that you do not need to install the software that we often need and are already installed by other lab members. We usually use Lonestar and Maverick for computing, and we archive large data on Ranch based on [[https://docs.google.com/document/d/17VkB7_HQUq7yeSr906Qlh7q8TlFX5nvP5a1F9csXBGY/edit|this]] protocol. A simple test for running a job on the Lonestar cluster is the following. Look at their user [[https://portal.tacc.utexas.edu/user-guides/lonestar5|guide]] and [[https://srcc.stanford.edu/sge-slurm-conversion|this]] table of commands for more details. \\ ''$ ssh <username>@ls6.tacc.utexas.edu \\  $ cd ~zare \\  login1.ls6(1099)$ cat ./test.mpi'' \\ ''login1.ls6(1099)$ sbatch ./test.mpi'' \\ You can monitor your jobs using ''squeue -u <usrname>''. The output will be saved in the ''~/temp''  subfolder. If there are multiple files in this folder, look at the newest one. \\ The above command will submit the job to the development queue. If you want to submit a job to the normal queue, you can do the following: \\ ''login1.ls6(1099)$ sbatch -p normal -n 1 -t 2 ./test.mpi'' \\ [[https://tacc-cloud.readthedocs.io/projects/agave/en/latest/index.html|Tapis]] is an //advanced//  optional patform for submitting jobs from your local computer, not recommended for beginners. Before submitting many jobs, estimate the time and memory by submitting a single job or preferably, running your code on toy data. If you think running a complete job takes more than 10 minutes on the cluster, before submitting more similar jobs, let Habil check your code to make sure we do not miss any easy [[https://bitbucket.org/habilzare/alzheimer/commits/250cacf92ef6b2886973dc609c229032b42b2234|parallelization]]. Familiarize yourself with [[https://docs.google.com/presentation/d/1TFjB16XmLk2Xo4qHJgGe1VfnIUonw9cIJmU1xCZxyL0/edit#slide=id.g242153ecf22_0_60|Docker]], which you can use to test your code locally before runing it on the cluster. [[https://bitbucket.org/habilzare/genetwork/src/master/code/docker/readme.txt|Use]]''habilzare/oncinfo:oncinfo-<version>''  and modify //only //''habilzare/oncinfo:oncinfo-dev-<version>''.
   - Every one should have a photo and their updated CV in pdf format on their personal page. {{:wiki:public:cv_template.zip|This}}  is an optional LaTeX template. The permission of any lab notebook (lano) should be set to "hidden"and it is important that they be updated EVERY day. [[https://www.dokuwiki.org/dokuwiki|DokuWiki]] provides us with two edit modes: ckg and DW. Use the one that is more convenient for you. Write your posts in anti-chronological order so that the newest post comes at the top. For facilitating future reference, avoid sending data as attachments. Instead, upload files to your lano and link to them where needed.   - Every one should have a photo and their updated CV in pdf format on their personal page. {{:wiki:public:cv_template.zip|This}}  is an optional LaTeX template. The permission of any lab notebook (lano) should be set to "hidden"and it is important that they be updated EVERY day. [[https://www.dokuwiki.org/dokuwiki|DokuWiki]] provides us with two edit modes: ckg and DW. Use the one that is more convenient for you. Write your posts in anti-chronological order so that the newest post comes at the top. For facilitating future reference, avoid sending data as attachments. Instead, upload files to your lano and link to them where needed.
-  - You can install **Google Scholar ** **[[https://chrome.google.com/webstore/detail/google-scholar-button/ldipcbpaocekfooobnbcddclnhejkcpn?hl=en|Button]]**add-on for an easier way of searching Google Scholar. You select the paper title and then click on the little blue icon on the top right corner. For any paper which you want to cite on the lab wiki, find it on Google Scholar, click on "More>Cite" and copy the MLA format. Also, use [[https://gsuite.google.com/marketplace/app/paperpile/894076725911|Paperpile]] for easy citation in Google doc, and Math [[https://gsuite.google.com/marketplace/app/math_equations/825973477142|Equations]] for writing and manipulating equations on Google presentations.+  - You can install **Google Scholar ** **[[https://chrome.google.com/webstore/detail/google-scholar-button/ldipcbpaocekfooobnbcddclnhejkcpn?hl=en|Button]]**add-on for an easier way of searching Google Scholar. You select the paper title and then click on the little blue icon on the top right corner. For any paper which you want to cite on the wiki or lab documents, find it on Google Scholar, click on "More>Cite" and copy the MLA format. Link the journal name to the paper ([[https://oncinfo.org/multiomics_analysis_for_dementia#related_work|example]]s). Also, use [[https://gsuite.google.com/marketplace/app/paperpile/894076725911|Paperpile]] for easy citation in Google doc, and Math [[https://gsuite.google.com/marketplace/app/math_equations/825973477142|Equations]] for writing and manipulating equations on Google presentations.
   - Create a Nature [[https://idp.nature.com/register/natureuser?redirect_uri=https://www.nature.com/my-account/alerts|account]] for yourself. To get a monthly list of published papers in Nature Methods, subscribe to the corresponding **alert**  . This can help you get a sense of where the field is going. You can also create an [[https://scholar.google.com/intl/en/scholar/help.html#alerts|alert]] on the Google Scholar to get regular updates on what is being published on the specific topic of your study.   - Create a Nature [[https://idp.nature.com/register/natureuser?redirect_uri=https://www.nature.com/my-account/alerts|account]] for yourself. To get a monthly list of published papers in Nature Methods, subscribe to the corresponding **alert**  . This can help you get a sense of where the field is going. You can also create an [[https://scholar.google.com/intl/en/scholar/help.html#alerts|alert]] on the Google Scholar to get regular updates on what is being published on the specific topic of your study.
   - **Code style**  in Oncinfo lab: We follow Hadley Wickhams’s R Style [[http://adv-r.had.co.nz/Style.html|Guide]] unless another convention is mentioned below. The goal is to include as much code as possible on 1 page so that it is easier to skim while keeping the overall structure such as proper indentation. Use [[https://contributions.bioconductor.org/r-code.html?q=inde#indentation|2 spaces]] (not tabs) for indentation. Like writing English texts, organize each script in small paragraphs and avoid extra white lines. Give each paragraph a short title in a comment. \\  When writing R code, use "''x <- 5''" for assigning a value to a variable. Do NOT use "''x = 5''" or "''x<-5''". We use**[[https://en.wikipedia.org/wiki/Camel_case|camelCase]]**in R, so do NOT use underscore, '_', and dot, '.', in variable or function names. E.g., instead of "''inverse_of''", use "''inverseOf''" as a variable name or function name so that you can select it by 1 click. Almost all functions must return a list so that extending them will be easy. Use "''##''" for comments NOT a single "''#''". Boolean flags and variable names start with and as [[https://en.wikipedia.org/wiki/Auxiliary_verb|auxiliary]] verb like ''do'', ''is'', or ''has'', e.g., ''doPlot'', ''isSingle'', ''hasMeta'', etc. Write the name of the loaded object in a comment in front of ''load()''. \\ Avoid long lines of code. Most lines should be < 90 characters, and all lines must be <100 characters . Thus, do NOT include space when using ''=''  in function calls. Good example: ''average <- mean(feet[ ,"real"]/12+inches, na.rm=TRUE) ## Spaces only around "<-" and after ","''. The space in "''[ ,''" is OK, which refers to all rows. It is better to place a space before the parenthesis after "''if (''", "''for (''", and alike, but do NOT use space between a function name and ''"("''  e.g., write ''plot(Data)''. Write the FULL name of arguments when calling a function and do NOT reply on their order, which may change in the future. \\ When the line is long, it usually means you need to extract some of it and define a new variable right above that line. \\ Data structures in R can be ordered from simple to complex as follows: number , vector, matrix, and list. Always use the simplest possible data structure, e.g., do not use a list when you can use a matrix and do not use a matrix with one column when you can use a named vector. The reson is that R has more tools for simpler data structures, e.g., ''sum''  and ''paste''  work on vectors, but not lists. To __add or modify__  a new element to a list use double square brackets like {{:wiki:public:screen_shot_2021-05-25_at_9.00.51_pm.png?linkonly|this}} , not "''$''". You can use "''$''" to __access__  elements of a list like ''A1 <- list1$matrixA''  (with some [[https://oncinfo.org/mohsens_lab_notebook#section20240211|caveates]]), but NOT to access a column of a data frame. Do not use [[https://www.datacamp.com/tutorial/pipe-r-tutorial|pipes]] in R (i.e., ''%>%''  and ''|>'') due to its [[https://stackoverflow.com/questions/38880352/should-i-avoid-programming-packages-with-pipe-operators|drawbacks]] outlined by Hadley [[https://r4ds.had.co.nz/pipes.html#when-not-to-use-the-pipe|Wickham]].   - **Code style**  in Oncinfo lab: We follow Hadley Wickhams’s R Style [[http://adv-r.had.co.nz/Style.html|Guide]] unless another convention is mentioned below. The goal is to include as much code as possible on 1 page so that it is easier to skim while keeping the overall structure such as proper indentation. Use [[https://contributions.bioconductor.org/r-code.html?q=inde#indentation|2 spaces]] (not tabs) for indentation. Like writing English texts, organize each script in small paragraphs and avoid extra white lines. Give each paragraph a short title in a comment. \\  When writing R code, use "''x <- 5''" for assigning a value to a variable. Do NOT use "''x = 5''" or "''x<-5''". We use**[[https://en.wikipedia.org/wiki/Camel_case|camelCase]]**in R, so do NOT use underscore, '_', and dot, '.', in variable or function names. E.g., instead of "''inverse_of''", use "''inverseOf''" as a variable name or function name so that you can select it by 1 click. Almost all functions must return a list so that extending them will be easy. Use "''##''" for comments NOT a single "''#''". Boolean flags and variable names start with and as [[https://en.wikipedia.org/wiki/Auxiliary_verb|auxiliary]] verb like ''do'', ''is'', or ''has'', e.g., ''doPlot'', ''isSingle'', ''hasMeta'', etc. Write the name of the loaded object in a comment in front of ''load()''. \\ Avoid long lines of code. Most lines should be < 90 characters, and all lines must be <100 characters . Thus, do NOT include space when using ''=''  in function calls. Good example: ''average <- mean(feet[ ,"real"]/12+inches, na.rm=TRUE) ## Spaces only around "<-" and after ","''. The space in "''[ ,''" is OK, which refers to all rows. It is better to place a space before the parenthesis after "''if (''", "''for (''", and alike, but do NOT use space between a function name and ''"("''  e.g., write ''plot(Data)''. Write the FULL name of arguments when calling a function and do NOT reply on their order, which may change in the future. \\ When the line is long, it usually means you need to extract some of it and define a new variable right above that line. \\ Data structures in R can be ordered from simple to complex as follows: number , vector, matrix, and list. Always use the simplest possible data structure, e.g., do not use a list when you can use a matrix and do not use a matrix with one column when you can use a named vector. The reson is that R has more tools for simpler data structures, e.g., ''sum''  and ''paste''  work on vectors, but not lists. To __add or modify__  a new element to a list use double square brackets like {{:wiki:public:screen_shot_2021-05-25_at_9.00.51_pm.png?linkonly|this}} , not "''$''". You can use "''$''" to __access__  elements of a list like ''A1 <- list1$matrixA''  (with some [[https://oncinfo.org/mohsens_lab_notebook#section20240211|caveates]]), but NOT to access a column of a data frame. Do not use [[https://www.datacamp.com/tutorial/pipe-r-tutorial|pipes]] in R (i.e., ''%>%''  and ''|>'') due to its [[https://stackoverflow.com/questions/38880352/should-i-avoid-programming-packages-with-pipe-operators|drawbacks]] outlined by Hadley [[https://r4ds.had.co.nz/pipes.html#when-not-to-use-the-pipe|Wickham]].