Useful tools for functional and pathway analysis and target identification

If you have a list of genes, and you want to know their functions, the diseases and pathways they might be associated with, and some known or candidate drug targets for them, you can use the following tools and databases. Your gene list might have been obtained from differential expression or other omics analyses, gene network analysis, a genome-wide association study (GWAS), and alike.

  • Agora hosts evidence for association of genes with Alzheimer's disease (AD) and includes over 600 potential drug targets for AD.
  • Functional genomics repository (FILER) and Alzheimer’s Disease Variants Portal (ADVP) are functional genomics database with harmonized, extensible, indexed, searchable human functional genomics data collected from 20 data sources and 80 cohorts, respectively.
  • Humanmine : given a list of genes as input (Entrez/Ensembl/Symbol or a mix), it returns interaction networks, pathways, Gene Ontology categories, relevant literature, protein domains, chromosome distributions and much more on a single page. There are also nice summaries that pop-up by hovering the mouse over a gene name.
  • GeneMANIA performs functional analysis using protein and genetic interactions, pathways, co-expression, co-localization and protein domain similarity.
  • Also see STICH for predicted and known chemical-protein interactions.
  • MSIGDB has around 10,000 sets (collections) of interesting genes, including curated sets from publications (e.g. genes affected by specific genetic and chemical perturbations), canonical pathways, GO, microRNA and transcription factor (TF) targets, as well as Hallmarks of particular processes, oncogenic signatures and more. We have some custom made scripts for mining these sets. The BROAD institute has also their own software with sleek GUIs (GSEA).
  • In addition to TFT collection from MSigDB, other resources for identifying transcription factor targets include: TRANSFAC, ORegAnno and tftargets. Alternatives computational tools, such as MEME, HOMER, PAZAR, and ITFP, should be used with care because their predictions are not as reliable as targets validated by real experiments.
  • gProfileR does Pathway,GO,TF,microRNA,OMIM, and Phenotype over-representation analysis, with quite up to date annotations.
  • TCNG, a cancer gene network of potential interest.
  • Stemfomatics, visualizes genes in exemplar stem cell datasets.
  • InnateDBcontains many (over 100,000) experimentally validated interactions (with some bias towards immunity related genes) as well as pathway, gene ontology and visualization tools.
  • SIGORA an alternative pathway analysis tool that focuses on features that are unique to each pathway.
  • BioConductor package mgsa ; Paper (using Bayesian networks and MCMC instead of over-representation analysis for determining relevant Gene Ontology terms). Might be more useful as an original conceptual reference point than a revolutionary practical tool.
  • Cancer and Tissue type specific PPIs and miRNA-targets: CancerNet (Paper) “Experimentally detected PPIs were assembled from five major PPI databases and miRNA–target interactions were considered as the combination of the predicted targets from six algorithms and two experimentally validated data sets, amounting to 185 589 PPIs and 3 249 385 miRNA–target interactions, respectively. Synergistic miRNA pairs were predicted according to the functions of target genes as well as their proximity in the PPI network.” E.g. for HCC, there are ~130,000 protein-protein interactions listed here.
  • SEEK. User's query: A list of genes. Returns a ranking of datasets in which the input genes are co-expressed, also, the list of other co-expressed genes. User can prioritizes ~5000 expression datasets based on tissue or disease.
  • An old (2010) survey of approaches for gene list enrichment analysis.
  • Genomicus: Facilitates navigation in genomes in several dimensions: linearly along chromosome axes, transversaly across different species, and chronologicaly along evolutionary time.
  • FGNet, Performs Functional Gene Network analysis. Uses DAVID, Gene-Term Linker, TopGo, and GAGE (GSEA) directly from R, also with a graphical user interphase.
  • pwOmics, an R package that integrates proteomics and transcriptomics data, and performs pathway, protein-protein interaction, and transcription factor analyses.
  • Free alternatives to Ingenuity Pathway Analysis (IPA).
  • See this Biostar trend for updates on recent tools such as Pathwaycommons, a kind of integrated approach including Reactome, and Pathview Web server .
  • The Human Protein Atlas is useful to get information on the expression of a particular protein at the cellular, normal tissue, and tumor levels.
  • There are some R tools for pathway analysis including ClusterProfiler.
  • Mouse Genome Informatics (MGI) has a nice hierarchal Gene Ontology Browser similar to Reactome.
  • Functional Enrichment Analysis overview. Isha's list of functional enrichment analysis tools in R.
  • GEPIA and GEPIA2 are web based resources for interactive gene expression profiling and analysis. The data sources used are TCGA and GTEx.
  • VarSAn: associating pathways with a set of genomic variants using network analysis. Results are similar to a simple overlap analysis using a hypergeometric test.


Khatri, Purvesh, Marina Sirota, and Atul J. Butte. “Ten years of pathway analysis: current approaches and outstanding challenges.” PLoS Comput Biol8.2 (2012): e1002375. A good survey. Criticized DE and overrepresentation analyses.