Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
how_to [2021/02/12 16:52] – [Identify Senescence in Cells and Tissues?] adminhow_to [2024/01/24 17:14] (current) – [Measure memory used by GPUa?] habil
Line 2: Line 2:
  
 This is a collection of short answers and links to some miscellaneous questions that we have had while doing research in Oncinfo. This can also be a useful resource for other scholars in the field of computational biology with similar interests and challenges. This is a collection of short answers and links to some miscellaneous questions that we have had while doing research in Oncinfo. This can also be a useful resource for other scholars in the field of computational biology with similar interests and challenges.
 +
 +----
 +
 +==== Measure memory used by GPUs? ====
 +
 +Use ''torch.cuda.[[https://pytorch.org/blog/understanding-gpu-memory-1/?hss_channel=tw-776585502606721024|memory]]'' , e.g., to discover the effect of clearing gradients at the end of each [[https://towardsdatascience.com/epoch-vs-iterations-vs-batch-size-4dfb9c7ce9c9|iteration]]. With ignite, this can be done using an event handler that calls ''optimizer.[[https://stackoverflow.com/questions/48001598/why-do-we-need-to-call-zero-grad-in-pytorch|zero_grad]](set_to_none=True)''.'' ''
 +
 +----
 +
 +
 +==== Analyze multiomicds data? ====
 +
 +[[http://biggsinstitute.org/neurepiomics-2023/#1686596557113-fac33f90-1709|Neurepiomics]] workshop in 2023:
 +
 +  - Shiva's presentation on omics[[https://docs.google.com/presentation/d/1W-pJfPy5MkQscPn5iOSJAkOzuyC2-eirBeHHeKHuYyw/edit#slide=id.p|databases]]
 +  - Mohsen's [[https://docs.google.com/presentation/d/14z2jAkYfmyH6dIH0Rr59ywZk--Kk9ocHzWEDiyd_P4k/edit#slide=id.p|presentation]] for Neurepiomics workshop, 2023-10-09.
 +  - Habil's [[https://docs.google.com/presentation/d/1mGld52Y8TkMXArc5EhcD2XY_0rXvb79BFEgA0YaZwms/edit#slide=id.g63f2c216dd_0_0|slides]] on multiomics data integration.
 +
 +----
 +
 +
 +==== What is the best Linux distribution for biologists? ====
 +
 +[[https://archlinux.org/|ArchLinux]] and [[https://wiki.archlinux.org/title/Arch-based_distributions|its derivatives]], like [[https://manjaro.org/|Manjaro]], are the top Linux distributions for biologists. The [[https://github.com/BioArchLinux/Packages|BioArchLinux community]] maintains an excellent repository with over 4,000 bioinformatics packages and their dependencies. Regular automated updates ensure the latest versions are available by fetching source code from trusted repositories and compiling releases optimized for the target architecture. Its philosophy is described [[https://f1000research.com/posters/11-809|here]].
 +
 +----
 +
 +
 +==== Download files to where you cannot login or click using a GUI? ====
 +
 +Login using GUI on another computer. Click to start downloading. While it is downloading, right-click on the page and then click on "Inspect". Then, click on "Network" on the top of the panel and find the file. Get the cURL [[https://www.youtube.com/watch?v=udfCVWuobhc|command]] and run it in terminal to download the file in the other computer.
 +
 +----
 +
 +
 +==== Supercharge your YouTube experience and minimize ads? ====
 +
 +"[[https://chrome.google.com/webstore/detail/youtube-enhancer/ejcniippeghnejiodjkkmndlelbagmah|Enhancer]] for YouTube" is a popular browser extension that enhances the YouTube viewing experience by providing additional features and customization options. While it doesn't explicitly focus on blocking ads, it offers some options to minimize or hide them. Use [[https://chrome.google.com/webstore/detail/autoskip-for-youtube-ads/hmbnhhcgiecenbbkgdoaoafjpeaboine|Autoskip]] to automatically skip ads that pass the Enhancer filter.
 +
 +----
 +
 +
 +==== Prevent Google Colab from being disconnected? ====
 +
 +To prevent disconnection of your Google Colab session, you can use the following script that keeps the session active by clicking on the notebook every 60 seconds:
 +
 +<code>
 +function ClickConnect(){
 +    console.log("Working");
 +    document.querySelector("colab-connect-button").click()
 +}
 +setInterval(ClickConnect,1000*60)
 +
 +</code>
 +
 +To implement this script, open the developer console in your browser (using Ctrl+Shift+J in Chrome or Ctrl+Shift+I in Firefox or Command+Option+C in Safari), and paste the above script in the console. This will help to ensure that your Colab session remains active and does not disconnect due to inactivity.
 +
 +----
 +
 +
 +==== Plot deep neural network architecture diagrams? ====
 +
 +There are [[https://datascience.stackexchange.com/questions/14899/how-to-draw-deep-learning-network-architecture-diagrams|several]] [[https://datascience.stackexchange.com/questions/12851/how-do-you-visualize-neural-network-architectures|apps]] that would help in this manner which are listed based on their descending qualities:
 +
 +  - [[https://github.com/HarisIqbal88/PlotNeuralNet|PlotNeuralNet]] (very simple, just generate fixed latex commands, no flexibility in the layers themselves, you need to rewrite models in this [[https://github.com/HarisIqbal88/PlotNeuralNet/blob/master/pyexamples/test_simple.py|language]]) {{https://user-images.githubusercontent.com/17570785/50308846-c2231880-049c-11e9-8763-3daa1024de78.png?direct&50x18}}
 +  - [[https://analyticsindiamag.com/how-to-visualize-deep-learning-models-using-visualkeras/|Visualkeras]] (plots Keras models) {{https://149695847.v2.pressablecdn.com/wp-content/uploads/2021/11/image-37.png?direct&30x31}}
 +  - [[http://alexlenail.me/NN-SVG/index.html|NN-SVG]]
 +  - [[https://github.com/gwding/draw_convnet|draw_convnet]] {{https://raw.githubusercontent.com/gwding/draw_convnet/master/convnet_fig.png?direct&100x32}}
 +  - [[https://github.com/lutzroeder/netron|netron]] (does not consider ModuleList inside the model in PyTorch) {{https://github.com/lutzroeder/netron/raw/main/.github/screenshot.png?direct&50x34}}
 +  - [[https://viscom.net2vis.uni-ulm.de/|Net2Vis]] (automatically generates abstract visualizations for convolutional neural networks from Keras code)
 +  - [[https://tikz.net/neural_networks/|LaTeX]] examples of neural networks.
 +
 +----
 +
 +
 +==== Append multiple videos quickly? ====
 +
 +First, make a text file named ''mylist.txt''  with the following content:
 +
 +file part1.mp4 \\ file part2.mp4 \\ file part3.mp4
 +
 +Then, run
 +
 +<code>
 +ffmpeg -f concat -safe 0 -i mylist.txt -c copy output.mp4
 +</code>
 +
 +which is fast because it only copies the audio and video codecs. ffmpeg can be [[https://phoenixnap.com/kb/ffmpeg-mac|installed]] on MacOS using brew.
 +
 +----
 +
 +==== Extend reads in a bam file to a fixed fragment size to better view peaks with IGV? ====
 +
 +Assume you have ''input.bam'', which contains mapped short single-end sequence reads, but you know the fragment size=''100''. With the following command, you can make a bam file that has the extended reads, but you will miss the actual reads. This helps to better visualize the actual read signal:
 +
 +<code>
 +bedtools bamtobed -i input.bam | \
 +awk -F'\t' 'BEGIN {OFS = FS} {if ($6=="+") {$3=$2+99} else {$2=$3-99}; if ($2<0) {$2=0}; print $0;}' | \
 +bedtools bedtobam -g hg19.chr.sizes -i -> extended_input.bam
 +</code>
 +
 +where ''hg19.chr.sizes''  is a tsv file with two columns containing chr names and sizes. \\  \\ NOTE: ''bedCoverage''  command is supposed to do this, but in my case, I got an error about the input bam file, despite the fact that it was a normal mapped bam file.
 +
 +----
 +
 +==== Run simultaneous jobs (on multiple files) in a single mpi on a cluster? ====
 +
 +It can be done using ''xargs -P <number-of-threads> <command>''. For example, we assume you are writing a bash script as follows: \\  \\ First, write a function (or script) to do the work on a single input file. Second, export the function using
 +
 +<code>
 +export -f my_func_for_single_file
 +</code>
 +
 + \\ Third, use the `find` command (or anything appropriate) and pipe the results to `xargs` like this:
 +
 +<code>
 +find . -type f -name "*.bam" -printf "%f\n" | xargs -P 125 -I {} bash -c 'my_func_for_single_file "$@"' _ {}
 +</code>
 +
 +Where ''%f''  is the basename of the file returned by the ''find''  . This command runs the task with 125 threads in parallel on a node (e.g. with 128 cores). Assuming you have 210 files, It will run all the task in 2 sweeps, which means it costs less almost with a factor of 125.
 +
 +Change 125 as you want, but **not**  more than the number of available cores in each compute node! :) Also you can adapt this idea to the cases where your function needs other inputs than the simple file name.
 +
 +----
 +
 +==== Plot models of Keras in R on MacOS? ====
 +
 +First, in Terminal , run [[https://graphviz.gitlab.io/download/|this]] brew command (which is much better than "port"):
 +
 +<code>
 +brew install graphviz
 +</code>
 +
 +Then run this in R:
 +
 +<code>
 +reticulate::py_install("pydot", pip = TRUE)
 +</code>
 +
 +----
 +
 +==== Identify genes of a pathway in R? ====
 +
 +1) ''ReactomeContentService4R''  is Reactome [[https://www.biostars.org/p/391386/#9463286|wrapper]] in R:
 +
 +<code>
 +library(ReactomeContentService4R)
 +event2Ids(event.id = "R-HSA-6790901") # can only input id here
 +</code>
 +
 +2) [[https://www.biostars.org/p/9978/#13639|Kegg]]:
 +
 +<code>
 +library(org.Hs.eg.db)
 +kegg <- org.Hs.egPATH2EG
 +mapped <- mappedkeys(kegg)
 +kegg2 <- as.list(kegg[mapped])
 +keggFind("pathway", "mtor") ## finds the pathway with a key word
 +ent <- kegg2 [["04150"|]] ##needs id of pathway (mtor), returns entrez id
 +gene.mapping(id=ent, inputType="ENTREZID", outputType = "SYMBOL")
 +</code>
  
 ---- ----
Line 16: Line 177:
  
 ---- ----
- 
  
 ==== Read and write excel files in R ==== ==== Read and write excel files in R ====
  
 Use [[https://cran.r-project.org/web/packages/openxlsx/index.html|openxlsx]] package to read, write and edit xlsx files in R. Package's integration with C++ makes it faster and easier to use. Simplifies the creation of Excel .xlsx files by providing a high level interface to writing, styling and editing worksheets. Through the use of 'Rcpp', read/write times are comparable to the 'xlsx' and 'XLConnect' packages with the added benefit of removing the dependency on Java. Use [[https://cran.r-project.org/web/packages/openxlsx/index.html|openxlsx]] package to read, write and edit xlsx files in R. Package's integration with C++ makes it faster and easier to use. Simplifies the creation of Excel .xlsx files by providing a high level interface to writing, styling and editing worksheets. Through the use of 'Rcpp', read/write times are comparable to the 'xlsx' and 'XLConnect' packages with the added benefit of removing the dependency on Java.
- 
 <code> <code>
 +
 E.g. Writing four dataframes in four sheets of excel workbook can be done as follows: E.g. Writing four dataframes in four sheets of excel workbook can be done as follows:
 library(openxlsx) library(openxlsx)
-listDataFrames <- list("GO-BP"=data.frame(egoBP), "GO-MF"=data.frame(egoMF)+data("mtcars"
-                       "KEGG= data.frame(eKEGG), "NCG= data.frame(ncg)+listDataFrames <- list("First"=mtcars[1:5,], "Second"=mtcars[10:15]) 
-xlsFile <- file.path(resultPath, paste0(l1, "_ORA_results.xlsx")) +xlsFile <- "~/temp/test.xlsx" 
-write.xlsx(x=listDataFramesfile=xlsFile)+w1 <- write.xlsx(x=listDataFrames, file=xlsFile) 
 + 
 +## You can add new sheets: 
 +addWorksheet(w1sheetName = "New") 
 +writeData(wb=w1sheet="New", mtcars[,1:3]) 
 +saveWorkbook(w1, file=xlsFile, overwrite=TRUE
 + 
 +## Read a sheet in a data frame: 
 +r1 <- read.xlsx(xlsxFile=xlsFilesheet="Second") 
 </code> </code>
  
Line 48: Line 217:
  
 ===== Add only modified changes and ignore untracked files using git? ===== ===== Add only modified changes and ignore untracked files using git? =====
 +
 +Using newer version of `git` (e.g., >=2.32.1),
 +
 +<code>
 +git commit -am 'minor changes'; git push
 +</code>
 +
 +⚠️ <font inherit/inherit;;#e74c3c;;inherit>Caution</font>: With older versions the -a option will add all files in the directory, which is usually NOT what you want. Instead, use following command, which would have the same effect as above:
  
 <code> <code>
Line 57: Line 234:
 ---- ----
  
 +==== Create an R package or use a package that is being developed? ====
  
-==== Use a package that is being developed? ==== +An R package source folder has predetermined [[https://r-pkgs.org/package-structure-state.html|structure]]If you change the content of a packge, you can use the [[https://bitbucket.org/habilzare/alzheimer/src/master/code/Habil/utilities/make.package.R|make.package]]() function to update (i.e.buildcheckand installthe package while preserving its structure like the [[https://bitbucket.org/habilzare/alzheimer/src/master/code/senescence/Shiva/packing.R|senescence]] and [[https://bitbucket.org/habilzare/genetwork/src/master/code/Ghazal/Packing/packing.R|iNETgrate]] package examples.
- +
-The following example is for the Genetwork package before it was contributed to Bioconductor. +
- +
-1) Put the following lines in your runall.R script where you are calling the libraries like [[https://bitbucket.org/habilzare/genetwork/src/8ad508f155b319dec7cedeb4dec7f61764946bf6/code/Habil/AML/runall.R?at=master|this]]+
- +
-<code> +
-## Updating Genetwork package: +
-try(detach(package:Genetwork, unload=TRUE),silent=TRUE) +
-system(paste("Rscript",packingFile,"--codePath",codePath)+
-library(Genetwork) +
-</code> +
- +
-packingFile and codePath are defined in code/Habil/AML/[[https://bitbucket.org/habilzare/genetwork/src/8ad508f155b319dec7cedeb4dec7f61764946bf6/code/Habil/AML/Settings.R?at=master|Settings]].R\\ +
-2) If you have written new functions, add them to the list in code/Habil/packing/[[https://bitbucket.org/habilzare/genetwork/src/8ad508f155b319dec7cedeb4dec7f61764946bf6/code/Habil/packing/update.R?at=master|update.R]]. Note that the path to the function is relative to the code/ directory.+
  
 ---- ----
Line 86: Line 250:
 ==== Install R locally (e.g. on a cluster)? ==== ==== Install R locally (e.g. on a cluster)? ====
  
-If you want to install the latest __development__ version of R on your macOS, first install [[https://github.com/fxcoudert/gfortran-for-macOS/releases|Fortran]] if you do not have it. You may also need to update [[https://superuser.com/a/664326|PCRE]].+Like most of Unix programs, R can be installed from source by a) downloading the [[https://cloud.r-project.org/|source]] code, configuring and compiling the code, and then installing the binaries. If you try this simple approach and your get errors like [[https://tdhock.github.io/blog/2017/compiling-R/|these]], it means the dependencies are not available or updated on your machine. E.g., f you want to install the latest __development__  version of R on your macOS, first install [[https://github.com/fxcoudert/gfortran-for-macOS/releases|Fortran]] if you do not have it. You may also need to update PCRE using brew on [[http://superuser.com/a/664326|macOS]]. Alternatively, you can compile PCRE2 from the [[https://www.linuxfromscratch.org/blfs/view/svn/general/pcre2.html|source]], and then let R where it is [[https://unix.stackexchange.com/a/149361|using]] CPPFLAGS and LDFLAGS.
  
-If you do not have sudo permissions, like when you are working on a cluster, you should either use the software module, or install it locally in your home directory. E.g., you can install R on Stampede or Maverick as follows:+If you do not have sudo permissions, like when you are working on a cluster, you should either use the software module, or install locally in your home directory. E.g., you can install R on Stampede or Maverick TACC clusters as follows:
  
 <code> <code>
Line 101: Line 265:
 </code> </code>
  
-Now, you can add $HOME/arch to your path by inserting the following line in your .bachrc file:+If the above works without any error, you can add $HOME/arch to your path by inserting the following line in your .bachrc file:
  
 <code> <code>
Line 107: Line 271:
 </code> </code>
  
-I had to follow [[http://pj.freefaculty.org/blog/?p=315|these]] steps to resolve the bzip2 issue on the Lonestar5 cluster. Oncinfo Lab members can use it if they add the following to their .bashrc+On some clusters, [[https://tdhock.github.io/blog/2017/compiling-R/|a few ]]libraries might not be installed or they might be too old (e.g., zlib, curl, bzip2, xz, pcre). In particular, the bzip2 issue can be resolved by following [[http://pj.freefaculty.org/blog/?p=315|these]] steps on the Lonestar5 cluster. Oncinfo Lab members can use it if they add the following to their .bashrc
  
 <code> <code>
 export PATH=/home1/03270/zare/Install/bin:$PATH export PATH=/home1/03270/zare/Install/bin:$PATH
 </code> </code>
 +
 +Installing R using [[https://datascience.stackexchange.com/questions/77335/anconda-r-version-how-to-upgrade-to-4-0-and-later/86905#86905|conda]] is only a quick and [[https://www.perfectlyrandom.org/2016/04/08/install-xml2-r-package-on-macos/|dirty]], temporary solution. E.g., as of 2021-05-14, the xml2 package that is installed by conda is not compatible with R 4.0 that is installed using conda, therefore, solving the issue in [[https://stackoverflow.com/questions/37035088/unable-to-install-r-package-due-to-xml-dependency-mismatch|this way]] moves the R version to from 4.0 back to 3.0! The time you will spend addressing such issues would be possibly more than the time you need to put on to a clean instalation of R from source.
  
 ---- ----
- 
  
 ==== Restore a file deleted in a local git directory? ==== ==== Restore a file deleted in a local git directory? ====
Line 121: Line 286:
  
 ---- ----
- 
  
 ==== Get familiar with machine learning and its applications in computational biology? ==== ==== Get familiar with machine learning and its applications in computational biology? ====
Line 132: Line 296:
  
 ---- ----
- 
  
 ==== Get access to the papers through the library when you are off-campus? ==== ==== Get access to the papers through the library when you are off-campus? ====
  
-In any of these two ways:\\ +In any of these two ways: \\ 
-a) First add the following to your browser (Chrome or Firefox) bookmarks.+a) First add the following to your browser (e.g., Chrome, Safari, or Firefox) bookmarks.
  
 +{{:wiki:public:screen_shot_2023-04-24_at_3.36.10_pm.png?direct&800|}}
 +For your convenience, you can write "javascript", and then ":voind()", and copy the following line between the parentheses.  
 <code> <code>
-javascript:void(location.href=%22http://libproxy.uthscsa.edu/login?url=%22+location.href)+location.href=%22http://libproxy.uthscsa.edu/login?url=%22+location.href
 </code> </code>
  
Line 146: Line 311:
  
 b) Use [[https://infosec.uthscsa.edu/two-factor-enrollment|GlobalProtect]], which is the University VPN. b) Use [[https://infosec.uthscsa.edu/two-factor-enrollment|GlobalProtect]], which is the University VPN.
 +
 +If an articles is not available from the library, it can be ordered via [[https://libguides.uthscsa.edu/ill|interlibrary loan]].
  
 ---- ----
- 
  
 ==== Convert pdf to MS word? ==== ==== Convert pdf to MS word? ====
  
-Try whatever you can to avoid conversion! Instead, educate your team and your collaborators to use [[https://www.authorea.com/users/54336|Authorea]], [[https://www.overleaf.com/|Overleaf]] or at least Google Doc. In Google Doc, references can be easily handled using [[https://gsuite.google.com/marketplace/app/paperpile/894076725911|Paperpile]] add-on (NOT the extension), and figures can be automatically numbered using the the [[https://gsuite.google.com/marketplace/app/cross_reference/269114033347?pann=cwsdp&hl=en|Cross Reference]] add-on as suggested in these [[https://lcolladotor.github.io/2019/04/02/how-to-write-academic-documents-with-googledocs/#.Xjne6RNKjUI|guidelines]] on how to write academic documents with Google Docs. Add-ons are not available when editing .docx files. __Only__ if your biologist collaborators cannot [[http://www.dedoimedo.com/computers/latex.html|unfortunately]] edit the LaTeX source, consider using a conversion tool such as Adobe [[https://chrome.google.com/webstore/detail/adobe-acrobat/efaidnbmnnnibpcajpcglclefindmkaj|Acrobat]] Chrome extension or Acrobat Pro, which can export a .pdf as a .doc file. docs. The docs [[https://docs.zone/|zone]] is an online alternative. If you need to to separate pages, use [[https://superuser.com/a/1584919|pdfjam]]. If Bibtex is not an option, use [[http://www.easybib.com/|EasyBib]].+Try whatever you can to avoid conversion! Instead, educate your team and your collaborators to use [[https://www.authorea.com/users/54336|Authorea]], [[https://www.overleaf.com/|Overleaf]] or at least Google Doc. In Google Doc, references can be easily handled using [[https://gsuite.google.com/marketplace/app/paperpile/894076725911|Paperpile]] add-on (NOT the extension), and figures can be automatically numbered using the the [[https://gsuite.google.com/marketplace/app/cross_reference/269114033347?pann=cwsdp&hl=en|Cross Reference]] add-on as suggested in these [[https://lcolladotor.github.io/2019/04/02/how-to-write-academic-documents-with-googledocs/#.Xjne6RNKjUI|guidelines]] on how to write academic documents with Google Docs. Add-ons are not available when editing .docx files. __Only__  if your biologist collaborators cannot [[http://www.dedoimedo.com/computers/latex.html|unfortunately]] edit the LaTeX source, consider using a conversion tool such as Adobe [[https://chrome.google.com/webstore/detail/adobe-acrobat/efaidnbmnnnibpcajpcglclefindmkaj|Acrobat]] Chrome extension or Acrobat Pro, which can export a .pdf as a .doc file. docs. The docs [[https://docs.zone/|zone]] and pdf [[https://www.freepdfconvert.com/|Converter]] are online alternatives. If you need to to separate pages, use [[https://superuser.com/a/1584919|pdfjam]]. If Bibtex is not an option, use [[http://www.easybib.com/|EasyBib]].
  
 ---- ----
- 
  
 ==== Enable spell check in Emacs on OS X? ==== ==== Enable spell check in Emacs on OS X? ====
Line 199: Line 364:
 ---- ----
  
 +==== Begin learning bioinformatics? ====
  
-====   Begin learning bioinformatics?   ==== +Take a course from the [[http://research.omicsgroup.org/index.php/List_of_free_online_bioinformatics_courses|list]] of free online bioinformatics courses e.g., the Computational Molecular Biology [[http://cmgm.stanford.edu/biochem218/index.html|Course]] at Stanford is broad and covers the classic topics but it is not updated, and may become outdated. The same is true for PLOS Translational Bioinformatics [[http://collections.plos.org/translational-bioinformatics|Collection]] of articles, which are more advanced. Most central topics are covered in some course from the European Bioinformatics Institute ([[https://www.ebi.ac.uk/training/online/course-list|EBI]]). Very useful training materials are available from [[https://www.mygoblet.org/|GOBLET]]. Videos from the Models, Inference & Algorithms Initiative ([[https://www.broadinstitute.org/scientific-community/science/mia/models-inference-algorithms|MIA]]) at Broad are relatively advanced. \\  \\ [[http://bioconductor.org/help/course-materials/|Bioconductor]] course material provides a walkthrough in specific areas. Also, you can find valuable resources from the [[http://stephenturner.us/edu.html|list]] of bioinformatics workshops and [[http://www.engr.uconn.edu/~mukul/cbc.htm|conferences]]. Following the "Bioinformatics for biologists [[:bioinformatics_for_biologist_workshop|workshop]]" does not require expertise in computer programing. Bio-Linux team prepared a good [[http://nebc.nerc.ac.uk/downloads/courses/Bio-Linux/bl8_latest.pdf|introduction]], which can be used as a reference too. If machine learning is new to you, [[http://www.nature.com/nrg/journal/v16/n6/full/nrg3920.html|read]] about its applications in genetics and genomics. You need to have some basic knowledge in [[https://www.coursera.org/learn/biostatistics?recoOrder=9&utm_medium=email&utm_source=recommendations&utm_campaign=recommendationsEmail~recs_email_2016_06_26_17%3A57|mathematics]] and statistics too.
- +
-Take a course from the [[http://research.omicsgroup.org/index.php/List_of_free_online_bioinformatics_courses|list]] of free online bioinformatics courses e.g., the Computational Molecular Biology [[http://cmgm.stanford.edu/biochem218/index.html|Course]] at Stanford is broad and covers the classic topics but it is not updated, and may become outdated. The same is true for PLOS Translational Bioinformatics [[http://collections.plos.org/translational-bioinformatics|Collection]] of articles, which are more advanced. Most central topics are covered in some course from the European Bioinformatics Institute ([[https://www.ebi.ac.uk/training/online/course-list|EBI]]). Very useful training materials are available from [[https://www.mygoblet.org/|GOBLET]]. Videos from the Models, Inference & Algorithms Initiative ([[https://www.broadinstitute.org/scientific-community/science/mia/models-inference-algorithms|MIA]]) at Broad are relatively advanced.\\ +
-\\ +
-[[http://bioconductor.org/help/course-materials/|Bioconductor]] course material provides a walkthrough in specific areas. Also, you can find valuable resources from the [[http://stephenturner.us/edu.html|list]] of bioinformatics workshops and [[http://www.engr.uconn.edu/~mukul/cbc.htm|conferences]]. Following the "Bioinformatics for biologists [[:bioinformatics_for_biologist_workshop|workshop]]" does not require expertise in computer programing. Bio-Linux team prepared a good [[http://nebc.nerc.ac.uk/downloads/courses/Bio-Linux/bl8_latest.pdf|introduction]], which can be used as a reference too. If machine learning is new to you, [[http://www.nature.com/nrg/journal/v16/n6/full/nrg3920.html|read]] about its applications in genetics and genomics. You need to have some basic knowledge in [[https://www.coursera.org/learn/biostatistics?recoOrder=9&utm_medium=email&utm_source=recommendations&utm_campaign=recommendationsEmail~recs_email_2016_06_26_17%3A57|mathematics]] and statistics too.+
  
 ---- ----
Line 220: Line 382:
 ==== Install Salmon on OSX? ==== ==== Install Salmon on OSX? ====
  
-\\+ \\
 If you do not have autoconf, [[http://mac-dev-env.patrickbougie.com/autoconf/|install]] it. Following the installation guidelines, for OSX you need to first [[http://stackoverflow.com/questions/3181468/how-do-you-install-intel-tbb-on-os-x|install]] Thread Building Blocks (TBB) (brew install tbb) and then check that the installation was successful (brew list). Download the latest [[https://github.com/COMBINE-lab/salmon/releases|version]] of Salmon source code and uncompress it. Follow Salmon's installation [[http://salmon.readthedocs.org/en/latest/building.html#installation|guidelines]]. The cmake command in the guidelines will be something like the following for OSX: If you do not have autoconf, [[http://mac-dev-env.patrickbougie.com/autoconf/|install]] it. Following the installation guidelines, for OSX you need to first [[http://stackoverflow.com/questions/3181468/how-do-you-install-intel-tbb-on-os-x|install]] Thread Building Blocks (TBB) (brew install tbb) and then check that the installation was successful (brew list). Download the latest [[https://github.com/COMBINE-lab/salmon/releases|version]] of Salmon source code and uncompress it. Follow Salmon's installation [[http://salmon.readthedocs.org/en/latest/building.html#installation|guidelines]]. The cmake command in the guidelines will be something like the following for OSX:
  
Line 233: Line 395:
 ===== Write a scientific paper? ===== ===== Write a scientific paper? =====
  
-Put the figures together and then [[http://www.scidev.net/global/publishing/practical-guide/how-do-i-write-a-scientific-paper-.html|draft]] different [[https://www.nature.com/articles/nmeth.4532?WT.ec_id=NMETH-201712&spMailingID=55474826&spUserID=MTIyMzczNjc4MDI2S0&spJobID=1285409878&spReportId=MTI4NTQwOTg3OAS2|sections]]. Focus the [[http://www.grantcentral.com/strategies-for-avoiding-common-problems-with-research-manuscripts/|Discussion]]. Be careful about [[http://colah.github.io/posts/2019-05-Collaboration/index.html|authorship]]. It might be easier to write the [[https://plos.org/resource/how-to-write-a-great-abstract/?utm_medium=email&utm_source=internal&utm_campaign=modnewsletters&utm_content=modnewsletter|abstract]] //after// other sections are drafted.+Put the figures together and then [[http://www.scidev.net/global/publishing/practical-guide/how-do-i-write-a-scientific-paper-.html|draft]] different [[https://www.nature.com/articles/nmeth.4532?WT.ec_id=NMETH-201712&spMailingID=55474826&spUserID=MTIyMzczNjc4MDI2S0&spJobID=1285409878&spReportId=MTI4NTQwOTg3OAS2|sections]]. Focus the [[http://www.grantcentral.com/strategies-for-avoiding-common-problems-with-research-manuscripts/|Discussion]]. Be careful about [[http://colah.github.io/posts/2019-05-Collaboration/index.html|authorship]]. It might be easier to write the [[https://plos.org/resource/how-to-write-a-great-abstract/?utm_medium=email&utm_source=internal&utm_campaign=modnewsletters&utm_content=modnewsletter|abstract]]//after//  other sections are drafted. You can use your [[https://www.nature.com/articles/d41586-018-02404-4|own]] voice.
  
-----+Many journals require vectorized figures and different tweaks to the fonts and colors. So, make sure your figures are easily reproducible. At the submission stage, you can change to a [[https://docs.google.com/document/d/1L49DoVzBfhJaavIvW5CvrxmwV-RHNDJcuFIZy39dS2g/edit|vectorized]] format like eps or pdf using different tools including [[http://blog.linuxgrrl.com/2013/08/12/how-to-produce-vector-eps-with-cmyk-color-using-free-software/|Inkscape]].
  
 +----
  
 ===== Prepare or review computational biology papers for Nature methods? ===== ===== Prepare or review computational biology papers for Nature methods? =====
Line 253: Line 416:
  
 ---- ----
- 
  
 ==== Get older versions using git? ==== ==== Get older versions using git? ====
Line 260: Line 422:
  
 ---- ----
- 
  
 ==== Learn about linear models and ANOVA in R? ==== ==== Learn about linear models and ANOVA in R? ====
Line 270: Line 431:
 ===== Convert gene or protein IDs? ===== ===== Convert gene or protein IDs? =====
  
-[[https://www.biostars.org/p/22/|Use]] [[https://biodbnet-abcc.ncifcrf.gov/db/db2db.php|bioDBnet]], BioMart - Ensembl, or [[https://bioconductor.org/packages/release/bioc/html/AnnotationDbi.html|AnnotationDbi]] package in R to convert between Entrez Gene, RefSeq, Ensemble, and many more.+[[https://biodbnet-abcc.ncifcrf.gov/db/db2db.php|bioDBnet]], BioMart - Ensembl, or [[https://bioconductor.org/packages/release/bioc/html/AnnotationDbi.html|AnnotationDbi]] package in R to convert between Entrez Gene, RefSeq, Ensemble, and many more.
  
 ---- ----
- 
  
 ===== Prepare attractive, scientific presentations ? ===== ===== Prepare attractive, scientific presentations ? =====
Line 280: Line 440:
  
 ---- ----
- 
  
 ==== Access a Bioconductor package source code? ==== ==== Access a Bioconductor package source code? ====
Line 304: Line 463:
  
 ---- ----
- 
  
 ==== Use git via proxy or vpn? ==== ==== Use git via proxy or vpn? ====
Line 313: Line 471:
 ---- ----
  
-**Enable autocomplete and the tab bar in Emacs?** \\ +==== Enable autocomplete and the tab bar in Emacs? ====
-[[https://www.emacswiki.org/emacs/InstallingPackages#toc2|Install]] the auto-complete and tabbar packages from [[https://www.emacswiki.org/emacs/MELPA|MELPA]]. In OS X, you may need to edit your init file (usually it is .emacs but you can use C-H v user-init-file RET to [[https://stackoverflow.com/questions/189490/where-can-i-find-my-emacs-file-for-emacs-running-on-windows|check]]) file to remove auto-complete from the package-selected-packages list and instead add the following lines:+
  
 +\\
 +[[https://www.emacswiki.org/emacs/InstallingPackages#toc2|Install]] the auto-complete and tabbar packages from [[https://www.emacswiki.org/emacs/MELPA|MELPA]]. In OS X, you may need to edit your init file (usually it is .emacs but you can use C-H v user-init-file RET to [[https://stackoverflow.com/questions/189490/where-can-i-find-my-emacs-file-for-emacs-running-on-windows|check]]) file to remove auto-complete from the package-selected-packages list and instead add the following lines:
 <code> <code>
 +
 (require 'auto-complete) (require 'auto-complete)
 (global-auto-complete-mode t) (global-auto-complete-mode t)
 +
 </code> </code>
  
Line 331: Line 492:
  
 **Resubmit an NIH grant?** \\ **Resubmit an NIH grant?** \\
-[[http://ctsi.ucla.edu/education/files/view/training/docs/study-sections-adams.pdf|Address]] ALL reviewers' comments. Write a strong [[http://www.regis.edu/~/media/Files/University/OAG/Learning-Resources/Create_Winning_Resubmission_Application_NIH_Report.ashx|introduction]] [{{:resubmittinggrantfelson10january2012.ppt|ppt}}  ].+[[http://ctsi.ucla.edu/education/files/view/training/docs/study-sections-adams.pdf|Address]] ALL reviewers' comments. Write a strong [[http://www.regis.edu/~/media/Files/University/OAG/Learning-Resources/Create_Winning_Resubmission_Application_NIH_Report.ashx|introduction]] [{{:resubmittinggrantfelson10january2012.ppt|ppt}}  ]. Watch NIH [[https://public.csr.nih.gov/ForApplicants/InitialReviewResultsAndAppeals/csrwebinar|Webinars]] for Applicants.
  
 ---- ----
Line 376: Line 537:
  
 **Analyze single cell RNA-seq data****?** \\ **Analyze single cell RNA-seq data****?** \\
-[[https://hemberg-lab.github.io/scRNA.seq.course/introduction-to-single-cell-rna-seq.html|Map]] the reads [[https://www.nature.com/news/single-cell-sequencing-made-simple-1.22233|similar]] to bulk RNA-seq data, exclude [[https://hemberg-lab.github.io/scRNA.seq.course/construction-of-expression-matrix.html#identifying-confounding-factors|outlier]] cells and genes, [[https://hemberg-lab.github.io/scRNA.seq.course/biological-analysis.html#clustering-introduction|cluster]] the cells, and [[https://hemberg-lab.github.io/scRNA.seq.course/biological-analysis.html#projecting-scrna-seq-data|project]] the cells onto an annotated reference such as Human Cell Atlas.+[[https://hemberg-lab.github.io/scRNA.seq.course/introduction-to-single-cell-rna-seq.html|Map]] the reads [[https://www.nature.com/news/single-cell-sequencing-made-simple-1.22233|similar]] to bulk RNA-seq data, exclude [[https://hemberg-lab.github.io/scRNA.seq.course/construction-of-expression-matrix.html#identifying-confounding-factors|outlier]] cells and genes, [[https://hemberg-lab.github.io/scRNA.seq.course/biological-analysis.html#clustering-introduction|cluster]] the cells, and [[https://hemberg-lab.github.io/scRNA.seq.course/biological-analysis.html#projecting-scrna-seq-data|project]] the cells onto an annotated reference such as Human Cell Atlas. The broken links can be found in the [[https://www.singlecellcourse.org/index.html|Sanger's]] course.
  
 ---- ----
Line 382: Line 543:
 **Make interactive 3D plots in R?** \\ **Make interactive 3D plots in R?** \\
 Use plot3D::[[http://www.sthda.com/english/wiki/impressive-package-for-3d-and-4d-graph-r-software-and-data-visualization#basic-scatter-plot|scatter3D]] to create the graph: Use plot3D::[[http://www.sthda.com/english/wiki/impressive-package-for-3d-and-4d-graph-r-software-and-data-visualization#basic-scatter-plot|scatter3D]] to create the graph:
- 
 <code> <code>
 +
 scatter3D(x=iris$Sepal.Length, y=iris$Petal.Length, z=iris$Sepal.Width, clab = c("Sepal", "Width (cm)")) scatter3D(x=iris$Sepal.Length, y=iris$Petal.Length, z=iris$Sepal.Width, clab = c("Sepal", "Width (cm)"))
 plotrgl() plotrgl()
 +
 </code> </code>
  
Line 402: Line 564:
 ---- ----
  
-**Test an R function that you have extracted from a script?** \\ +==== Test an R function that you have extracted from a script? ==== 
-Sometimes you have a working script but some part of it is useful also elsewhere. You extract that portion and make a function f(x,y) out of it. Now, in the script you call that function. You run your modified script and the output is exactly the same as before. Is this a sufficient test for the function you just wrote? No! What if you forget to include one of the variables as an input argument of the function? Then, that variable, which is still defined somewhere in the script before you call the function, is used in the function as a "global variable" without any explicit error. However, when the function is used in a different context, that global variable may not be defined, or worse, it may have an irrelevant value. To avoid this issue, test your function as follows:+ 
 +Sometimes you have a working script but some part of it is useful also elsewhere. You extract that portion and make a function f(x,y) out of it. Now, in the script you call that function. You run your modified script and the output is exactly the same as before. Is this a sufficient test for the function you just wrote? No! What if you forget to include one of the variables as an input argument of the function? Then, that variable, which is still defined somewhere in the script before you call the function, is used in the function as a "global variable" without any explicit error. However, when the function is used in a different context, that global variable may not be defined, or worse, it may have an irrelevant value. To avoid this issue, use codetools::[[https://rdrr.io/cran/codetools/man/findGlobals.html|findGlobals]](..., merge=FALSE) to identify and remove all global variables from your function. A cumbersome and less accurate way to identify global variables follows:
  
   - Put a browser in the script right before calling the function.   - Put a browser in the script right before calling the function.
Line 415: Line 578:
 ---- ----
  
-**Separate the reads corresponding to each individual cell from a single-cell RNA-Seq fastq file?** \\ [[http://www.arrayserver.com/wiki/index.php?title=Preprocess_of_SingleCell_RNA-Seq_Data|Use]] the barcodes and Unique Molecular Identifiers (UMIs).+ 
 +====   Separate the reads corresponding to each individual cell from a single-cell RNA-Seq fastq file?   ==== 
 + 
 + \\ [[http://www.arrayserver.com/wiki/index.php?title=Preprocess_of_SingleCell_RNA-Seq_Data|Use]] the barcodes and Unique Molecular Identifiers (UMIs).
  
 ---- ----
  
-**Copy from your prior clipboard history?** \\ Use a clipboard manager like [[https://clipy-app.com/|Clipy]] (developed based on ClipMenu) or JumpCut. Clipy allows you to save lists of favorite or frequent text snippets for later use.+====   Copy from your prior clipboard history?   ==== 
 + 
 +Use a clipboard manager like [[https://clipy-app.com/|Clipy]] (developed based on ClipMenu) or JumpCut. Clipy allows you to save lists of favorite or frequent text snippets for later use.
  
 ---- ----
  
-**Review a paper?** \\ Read the journal guidelines, and the Nature's quick and concise [[https://www.springer.com/gp/authors-editors/authorandreviewertutorials/howtopeerreview?utm_source=hybris&utm_medium=email&utm_content=internal&utm_campaign=AEXS_1_PF_ReviewerThankYou_tutorial&sap-outbound-id=37A4EBB275E142189BFB59083911E79615029EBE|tutorial]].+====   Review a paper?   ==== 
 + 
 +Read the journal guidelines, and the Nature's quick and concise [[https://www.springer.com/gp/authors-editors/authorandreviewertutorials/howtopeerreview?utm_source=hybris&utm_medium=email&utm_content=internal&utm_campaign=AEXS_1_PF_ReviewerThankYou_tutorial&sap-outbound-id=37A4EBB275E142189BFB59083911E79615029EBE|tutorial]]. The PLOS Open Reviewer [[https://genweb.plos.org/RR/PLOSOpenReviewerGatewayInfosheet.pdf|Gateway]] can be helpful if you are interested in becoming a reviewer for PLOS journals.
  
 ---- ----
  
-**Adjust the acceleration and speed of a Logitec mouse?** \\  \\ Install and [[https://apple.stackexchange.com/questions/253111/how-to-disable-scroll-acceleration-in-macos-sierra|use]] the Logitech Control Center.+====   Adjust the acceleration and speed of a Logitec mouse?   ==== 
 + 
 + \\ Install and [[https://apple.stackexchange.com/questions/253111/how-to-disable-scroll-acceleration-in-macos-sierra|use]] the Logitech Control Center.
  
 ---- ----
Line 434: Line 606:
 ==== Encrypt a folder? ==== ==== Encrypt a folder? ====
  
-Compress the folder in 7z format using the AES-256 encrypting algorithm. [[https://www.dzhang.com/blog/2018/03/11/using-7-zip-create-aes-256-encrypted-zip-files-command-line|E.g]],+Encrypt a ''largeFolder''  folder using [[https://www.cyberciti.biz/tips/linux-how-to-encrypt-and-decrypt-files-with-a-password.html|tar]] and compress it using ''gpg''  based on the AES-256 encrypting algorithmYou will obtain the [[https://crypto.stackexchange.com/a/71078|strongest]] security with these options:
  
 <code> <code>
-7z a -tzip -mem=AES256 -p super-secret.7z super-secret_folder+tar -cvz largeFolder | gpg --s2k-mode 3 --s2k-count 65011712 --s2k-digest-algo SHA512 --s2k-cipher-algo AES256 --symmetric --no-symkey-cache -o largeFolder.tgz.gpg 
 +</code>
  
-7z x super-secret.7z ## Decrypt and uncomperess+The ''–no-symkey-cache''  option is available in [[https://unix.stackexchange.com/a/557051|version]] >=2.2.7. On macOS, you need to first install [[https://sourceforge.net/p/gpgosx/docu/Download/|GnuPG]]. An alternative approach is to use [[http://www.dzhang.com/blog/2018/03/11/using-7-zip-create-aes-256-encrypted-zip-files-command-line|7z]], which can be installed using [[http://molecularsciences.org/content/installing-and-running-7-zip-from-mac-terminal/|homebrew]], however, 7z is windows based and thus not recommended. \\  \\ 
 +To decrypt and uncomperess
 + 
 +<code> 
 +gpg --decrypt --no-symkey-cache largeFolder.tgz.gpg | tar -xv
 </code> </code>
  
-A good password should have at least 12 characters, include both small and capital letters, and at least one digit and one special character such as !@#$%^&*(). Do not use dictionary words in your password, instead, use a [[https://cybernews.com/best-password-managers/how-to-create-a-strong-password/|passphrase]] "to create strong passwords". Before running the above commands __in terminal__ on macOS, install [[https://molecularsciences.org/content/installing-and-running-7-zip-from-mac-terminal/|p7zip]] using homebrew.+A good password should have at least 12 characters, include both small and capital letters, and at least one digit and one special character such as !@#$%^&*(). Do not use dictionary words in your password, instead, use a [[https://cybernews.com/best-password-managers/how-to-create-a-strong-password/|passphrase]] "to create strong passwords".
  
 ---- ----
- 
  
 ==== Upload a file to Oncinfo and link to it? ==== ==== Upload a file to Oncinfo and link to it? ====
  
-{{:screen_shot_2019-02-12_at_12.38.39_pm.png?linkonly|Set}} the link type to "internal media", click on "Browse Server", and then on "Choose File" and "Update".+{{:screen_shot_2019-02-12_at_12.38.39_pm.png?linkonly|Set}}  the link type to "internal media", click on "Browse Server", and then on "Choose File" and "Update".
  
 ---- ----
Line 470: Line 646:
  
 ---- ----
- 
  
 ===== Choose a solid state (SSD) external drive? ===== ===== Choose a solid state (SSD) external drive? =====
  
 The non-volatile memory express (NVMe) devices are [[https://ssd.borecraft.com/SSD_Buying_Guide_List.pdf|better]] than SATA solid state drives. Good brands include [[https://smile.amazon.com/gp/product/B07X6CKHH1/ref=ox_sc_act_title_1?smid=A29Y8OP2GPR7PE&psc=1|Sabrent]] (Nano is smaller than Pro but gets hot when extensivly used), Seagete, Addlink, and Team. As of 2020, a speed of 1000 Mb/s is possible using USB 3.2. The non-volatile memory express (NVMe) devices are [[https://ssd.borecraft.com/SSD_Buying_Guide_List.pdf|better]] than SATA solid state drives. Good brands include [[https://smile.amazon.com/gp/product/B07X6CKHH1/ref=ox_sc_act_title_1?smid=A29Y8OP2GPR7PE&psc=1|Sabrent]] (Nano is smaller than Pro but gets hot when extensivly used), Seagete, Addlink, and Team. As of 2020, a speed of 1000 Mb/s is possible using USB 3.2.
- 
----- 
- 
-===== Work with screen session? ===== 
- 
-There are five main [[http://www.pixelbeat.org/lkdb/screen.html|commands]] while working with screen session: 
- 
-  - Start and name a screen: ''screen -S $NAME'' 
-  - Detach from a screen: ''Ctrl+a d'' 
-  - See the list of active screens: ''screen -ls'' 
-  - Reattach to a screen: ''screen -r $NAME'' 
-  - Quit and [[https://askubuntu.com/questions/356006/kill-a-screen-session|kill]] your screen: ''Ctrl+a then Ctrl+\'' 
  
 ---- ----