This is an old revision of the document!


How to

This is a collection of short answers and links to some miscellaneous questions that we have had while doing research in Oncinfo. This can also be a useful resource for other scholars in the field of computational biology with similar interests and challenges.


Set local mirror for Rscript

Put following lines above your script.

local({r <- getOption("repos")
r["CRAN"] <- "your mirror, example: http://cran.us.r-project.org "
options(repos=r)
})

Add only modified changes and ignore untracked files using git?

git ls-files --modified | xargs git add; git commit -m 'minor changes'; git push

If there is a conflict at push time, first pull. Now, you need to look for “»>” in the code, and manually fix the conflict. Then, push again.


Use a package that is being developed?

The following example is for the Genetwork package before it was contributed to Bioconductor.

1) Put the following lines in your runall.R script where you are calling the libraries like this:

## Updating Genetwork package:
try(detach(package:Genetwork, unload=TRUE),silent=TRUE)
system(paste("Rscript",packingFile,"--codePath",codePath))
library(Genetwork)

packingFile and codePath are defined in code/Habil/AML/Settings.R
2) If you have written new functions, add them to the list in code/Habil/packing/update.R. Note that the path to the function is relative to the code/ directory.


Cancel all your jobs on the Stampede?

squeue -l -u $USER|awk 'FNR>2 {print $1}' |xargs  scancel

Install R locally (e.g. on a cluster)?

If you do not have sudo permissions, like when you are working on a cluster, you should either use the software module, or install it locally in your home directory. E.g., you can install R on Stampede or Maverick as follows:

mkdir ~/tempR; cd ~/tempR ## Make a temp directory
wget http://cran.cnr.berkeley.edu/src/base/R-3/R-3.2.2.tar.gz ## Always get the latest version
tar -zxvf ./R-3.2.2.tar.gz; cd ./R-3.2.2 ## Unzip, the rest is by following INSTALL file that is in this folder.
mkdir ~/arch ## You will install here.
./configure --prefix $HOME/arch
make
make check
make install

Now, you can add $HOME/arch to your path by inserting the following line in your .bachrc file:

export PATH=$HOME/arch/bin:$PATH

I had to follow these steps to resolve the bzip2 issue on the Lonestar5 cluster. Oncinfo Lab members can use it if they add the following to their .bashrc

export PATH=/home1/03270/zare/Install/bin:$PATH

Write in xls files from R?
Use writeWorksheetToFile() function from XLConnect package like below:

writeWorksheetToFile(data=matrix(1:10,2,2),file='./temp.xls',sheet='test1')
writeWorksheetToFile(data=matrix(1:6,2,3),file='./temp.xls',sheet='test2')

Restore a file deleted in a local git directory?
Use git reset –hard to completely bring your working directory to HEAD state. However, this is a dangerous command because you may loose some local files that are not pushed yet.


Get familiar with applications of machine learning in genetics and genomics?
First read this 2015 review paper and then follow its references.


Get access to the papers through the library when you are off-campus?
First add the following to your browser (Chrome or Firefox) bookmarks.

javascript:void(location.href=%22http://libproxy.txstate.edu/login?url=%22+location.href)

Then, on the journal page, click on the bookmark. Login and start reading.


Convert pdf to MS word?
Try whatever you can to avoid conversion! Instead, educate your team and your collaborators to use Authorea, Overleaf or at least google doc. Only if your biologist collaborators cannot unfortunately edit the LaTeX source, consider using a conversion tool such as docs.zone. Alternatively, Acrobat Pro can export a .pdf as a .doc file. If Bibtex is not an option, use EasyBib.


Enable spell check in Emacs on OS X?
The default Aquamacs spell checker has some issues. To replace it, first install Aspell, which is a replacement for Ispell:

brew install aspell --with-lang-en

And then add the following line to your emacs initialization file, e.g., ~/Library/Preferences/Aquamacs Emacs/Preferences.el, or ~/.emacs

(setq ispell-program-name "/usr/local/bin/aspell")

Do microaray analysis or anything else in Bioconductor?
This is an excellent site with many well commented code examples and a lot of handy short-cuts. See also Functional analysis tools.


Learn to learn?
Take this online course: Learning How to Learn: Powerful mental tools to help you master tough subjects.


Improve your writing?
Read and write a lot. Have someone proofread your write up. Use tips and resources that can help you improve your writing skills to the next level. E.g., you can find valuable professional resources on Writing Forward website. Coursera has a special course on writing emails in its “Improve Your English Communication Skills Specialization”.


Prepare a scientific poster?
Decide on the main figures that you want to include. Put them in a template and write a caption for each figure. Write your abstract and an appropriate title. Send it to collaborators and ask for their comments at least a week before printing.


Find a biological database of interest?
First, look at the list of biological databases. Also, if you know an important database that is missed in this list, please add it as a service to the community.


Begin learning bioinformatics?
Take a course from the list of free online bioinformatics courses e.g., the Computational Molecular Biology Course at Stanford is broad and covers the classic topics but it is not updated, and may become outdated. The same is true for PLOS Translational Bioinformatics Collection of articles, which are more advanced. Most central topics are covered in some course from the European Bioinformatics Institute (EBI). Very useful training materials are available from GOBLET.

Bioconductor course material provides a walkthrough in specific areas. Also, you can find valuable resources from the list of bioinformatics workshops and conferences. Following the “Bioinformatics for biologists workshop” does not require expertise in computer programing. Bio-Linux team prepared a good introduction, which can be used as a reference too. If machine learning is new to you, read about its applications in genetics and genomics. You need to have some basic knowledge in mathematics and statistics too.


Decide on appropriate courses for a program in computational biology?
Browse the Online Computational Biology Curriculum, which lists hundreds of courses available from Coursera, Edx, etc, and comments on their relevance to computational biology. Berkeley and Stanford also have lists of relevant courses.


Establish and maintain a computational biology lab?
Do not miss About My Lab, a valuable collection of PLOS articles on how to manage a lab. This is a fundamentally difficult job.


Install Salmon on OSX?
If you do not have autoconf, install it. Following the installation guidelines, for OSX you need to first install Thread Building Blocks (TBB) (brew install tbb) and then check that the installation was successful (brew list). Download the latest version of Salmon source code and uncompress it. Follow Salmon's installation guidelines. The cmake command in the guidelines will be something like the following for OSX:

cmake -DFETCH_BOOST=TRUE -DTBB_INSTALL_DIR=/usr/local/lib -DCMAKE_INSTALL_PREFIX=/usr/local/

Follow the rest of the guidelines. You may need to use (sudo make install) if you get a permission error.


Write a scientific paper?
Put the figures together and then draft different sections. Focus the Discussion.


Prepare or review computational biology papers for Nature methods?
Read their “Reviewing computational methods” (2015) and “Guidelines for algorithms and software in Nature Methods” (2014) articles. Provide source code, pseudocode, compiled executables, and the mathematical description. Softwares must be accompanied with documentation, sample data and the expected output, and a license (e.g., GPL≥2). Have a look at The list of computational biology papers in Nature Methods published in 2015, and the hints by an editor of Nature Communications.


Set the default width of fill mode (line length) in emacs?
Use 'M-x customize-variable' to set 'fill-column' (100 in Oncinfo). Use DejaVu Sans Mono (~Menlo on MacOS) size 18-20 is an appropriate font for programming in Emacs. To do so, you may need to manually edit your .emacs in macOS.


Get older versions using git?
Use “git log” to see the previous commits and the corresponding hashes, “git checkout <hash>” to get an older version, and “git checkout master” to get back.


Learn about linear models and ANOVA in R?
Review Advanced Statistical Methods II lecture notes by Dr. Larry Ammann at UT Dallas.


Convert gene or protein IDs?
Use bioDBnet, BioMart - Ensembl, or AnnotationDbi package in R to convert between Entrez Gene, RefSeq, Ensemble, and many more.


Prepare attractive, scientific presentations?
Use a “home slide”. Also, learn about other tips from Susan McConnell.


Access a Bioconductor package source code?
It is always better to a install the latest version of a package as directed in the corresponding Bioconductor page (e.g., Pigengene). If you need to see more details in the source code, you can clone the source from the Bioconductor mirror, e.g.,

mkdir ~/proj; cd ~/proj
git clone https://github.com/Bioconductor-mirror/Pigengene.git

Use git via proxy or vpn?
Use sshuttle, e.g., sshuttle -r h_mailto:z14@nyx.cs.txstate.edu 0.0.0.0/0 -vv
The list of servers at Texas Sate University are listed here.


Enable autocomplete and the tab bar in Emacs?
Install the auto-complete and tabbar packages from MELPA. In OS X, you may need to edit your init file (usually it is .emacs but you can use C-H v user-init-file RET to check) file to remove auto-complete from the package-selected-packages list and instead add the following lines:

(require 'auto-complete)
(global-auto-complete-mode t)

Also, read the packages instructions to learn how to configure and use them.


Cite references?
Find the original paper that is most relevant. Setup recommendation engines not to miss recently published work.


Resubmit an NIH grant?
Address ALL reviewers' comments. Write a strong introduction [ppt ].


Silence a gene?
Small interfering (si) RNAs and miRNAs bindd to mRAN and prevent it from being translated.


Compare the data resources on a gene across databases?
Xena provides heatmaps for one gene using databases selected from a list of tens of databases from TCGA,1000 genomes, etc. Cancer Genome Cloud allows the user to integrate their own tools with theplatform and run it on, say, TCGA in the cloud. Seeks orders thousands of datasets based on how concordant the expression of the input genes are.


Save Powerpoint in high resolution on OS X?
Select all, open Preview > File > New from Clipboard. DPI is different from PPI.


Download TCGA normal data?
Use “Add cases filter” link to add sample_type as a filter.


Know about the immune system?
It is a prerequisite (2012) to understand immunotherapy (2015). CAR-T Cells (2014) are engineered outside the body to express receptors specific to a patient's particular cancer. Immune checkpoint (2017) inhibitors can release the break of the immune system.


Use computational biology in immunotherapy?
A) Mine the gene expression or mutation databases to discover tumor antigens (targets of immune system). B) Combine these data with mass spectrometric data from tumor specimens, or sera, to identify immunogenic (actionable) antigens. C) Analyze gene expression profiles to predict which patients will benefit (also, have benefitted) from a specific treatment (e.g., infer infiltration of immune cells in tumors). D) “Evaluation of humoral response against a diverse set of self-antigens can be explored rapidly using the high-content protein or peptide microarrays” (Thakurta et al.). E) Assess the diversity of the vast 10^14 possibilities for T-cell receptors using proteomics or sequencing. Many tools have been developed to address these needs (2017 review).


Discover immunogenic antigens (neoantigens) suitable for immunotherapy?
Identify the somatic mutations in expressed genes, predict epitope in silico using publicly available databases, and immunize mice with long peptides encoding the mutated epitopes to determine immunogenicity.


Avoid misinterpretation of biological experiments?
Reasoning must be logical. Report enough details of the methods to reproduce the results. Assess the robustness of the findings with respect to minor perturbations to the experimental settings. To prove that drug A targets protein X, it is not sufficient to confirm that treatment with A leads to killing cells that have X. Maybe the cells are killed because of some other mechanism. Use “rescue experiments” as in the A=imatinib X=BCR–ABL case.


Analyze single cell RNA-seq data?
Map the reads similar to bulk RNA-seq data, exclude outlier cells and genes, cluster the cells, and project the cells onto an annotated reference such as Human Cell Atlas.


Make interactive 3D plots in R?
Use plot3D::scatter3D to create the graph:

scatter3D(x=iris$Sepal.Length, y=iris$Petal.Length, z=iris$Sepal.Width, clab = c("Sepal", "Width (cm)"))
plotrgl()

and then use plot3Drgl::plotrgl to interactively zoom and rotate it.


Improve the ranking of your institution better than MIT?!
Hire scholars who are frequently cited as adjunct professors.


Rescue US biomedical research from its systemic flaws?
Hypercompetition has damaging effects. Reduce the number of PhD students and postdocs. Revise funding mechanisms to encourage “path-breaking ideas that, by definition,” are risky. Support early-stage investigators.


Test an R function that you have extracted from a script?
Sometimes you have a working script but some part of it is useful also elsewhere. You extract that portion and make a function f(x,y) out of it. Now, in the script you call that function. You run your modified script and the output is exactly the same as before. Is this a sufficient test for the function you just wrote? No! What if you forget to include one of the variables as an input argument of the function? Then, that variable, which is still defined somewhere in the script before you call the function, is used in the function as a “global variable” without any explicit error. However, when the function is used in a different context, that global variable may not be defined, or worse, it may have an irrelevant value. To avoid this issue, test your function as follows:

  1. Put a browser in the script right before calling the function.
  2. Save the function and its inputs in a temporary file, e.g., save(x,y, f, file=“~/temp/f1.RData”)
  3. Remove all objects in this R session using rm(list=ls()).
  4. load(“~/temp/f1.RData”).
  5. Type “c” and “Enter” to run the rest of the script. The output must be exactly the same.
  6. If the function uses a global variable, while running it this way, you will probably get an error like “object 'z' not found”. Then, either compute z in f, add it as an input argument of f, or modify the line that uses z.
  7. Repeat until you get no errors, and then, check that the output is exactly the same as before.

Separate the reads corresponding to each individual cell from a single-cell RNA-Seq fastq file?
Use the barcodes and Unique Molecular Identifiers (UMIs).


Copy from your prior clipboard history?
Use a clipboard manager like Clipy (developed based on ClipMenu) or JumpCut. Clipy allows you to save lists of favorite or frequent text snippets for later use.


Review a paper?
Read the journal guidelines, and the Nature's quick and concise tutorial.


Adjust the acceleration and speed of a Logitec mouse?

Install and use the Logitech Control Center.


Encrypt a folder?

Compress the folder in 7z format using the AES-256 encrypting algorithm. E.g,

7z a -tzip -mem=AES256 -p super-secret.7z super-secret_folder

7z x super-secret.7z ## Decrypt and uncomperess