View on GitHub

richR

Function Enrichment analysis and Network construction

richR

Project Status: R-CMD-check DOI

Overview

richR is an R package for functional enrichment analysis and visualization. It supports Over-Representation Analysis (ORA) via hypergeometric testing, Gene Set Enrichment Analysis (GSEA), and kappa-score-based term clustering. Built-in annotation builders cover GO, KEGG, Reactome, KEGG Module, and MSigDB, and you can supply custom gene sets via GMT files or named lists.

Key features

Installation

# Install from GitHub
library(devtools)
install_github("guokai8/richR")

Bioconductor annotation packages are needed for building annotations:

BiocManager::install(c("org.Hs.eg.db", "GO.db"))       # human GO
BiocManager::install("reactome.db")                      # Reactome (optional)

Quick Start

1. Build annotation data

library(richR)

# Check available species
showData()

# Build GO and KEGG annotations
hsago <- buildAnnot(species = "human", keytype = "SYMBOL", anntype = "GO")
hsako <- buildAnnot(species = "human", keytype = "SYMBOL", anntype = "KEGG")

# KEGG Module
hsakom <- buildAnnot(species = "human", keytype = "SYMBOL", anntype = "KEGGM")

# MSigDB
hsamgi <- buildMSIGDB(species = "human", keytype = "SYMBOL", anntype = "GO")

2. Run enrichment analysis

# GO enrichment (ORA)
gene <- sample(unique(hsago$GeneID), 1000)
resgo <- richGO(gene, godata = hsago, ontology = "BP")
head(resgo)

# KEGG enrichment
resko <- richKEGG(gene, kodata = hsako, pvalue = 0.05)

# GSEA (requires a named numeric vector)
genelist <- rnorm(1000)
names(genelist) <- sample(unique(hsako$GeneID), 1000)
res_gsea <- richGSEA(genelist, object = hsako)

# DAVID (online)
res_david <- richDAVID(gene, keytype = "ENTREZID", species = "human")

3. Custom gene sets

# Import from GMT file (MSigDB, Enrichr, etc.)
annot <- readGMT("h.all.v2023.2.Hs.symbols.gmt", species = "human")
res <- enrich(my_genes, annot)

# From a named list
my_sets <- list(
  "Apoptosis"  = c("TP53", "BAX", "BCL2", "CASP3"),
  "Cell Cycle" = c("CDK1", "CDK2", "CCND1", "RB1")
)
annot <- buildAnnotFromList(my_sets)
res <- enrich(my_genes, annot)

4. Custom annotation with bioAnno

# library(bioAnno)
# fromKEGG(species = "ath")
# athgo <- buildOwn(dbname = "org.ath.eg.db", anntype = "GO")
# athko <- buildOwn(dbname = "org.ath.eg.db", anntype = "KEGG")
# See https://github.com/guokai8/bioAnno for details

Visualization

richR provides 20+ visualization functions. All accept richResult, GSEAResult, or data.frame objects and support saving to file via filename, width, and height arguments. Each function has a rich* primary name and backward-compatible gg* aliases.

Term-level summary plots

Classic plots that rank and display enriched terms:

# Bar plot — horizontal bars of gene count per term, colored by significance
richBar(resgo, top = 20, usePadj = FALSE)

# Dot plot — bubble plot: x = RichFactor, y = Term, size = gene count, color = -log10(p)
richDot(resko, top = 10, usePadj = FALSE)

# Lollipop — clean alternative to bar plot: x = RichFactor, color = significance
richLollipop(resko, top = 20)

# Circular bar plot — polar-coordinate bar chart of enrichment terms
richCircle(resko, top = 15)

Scatter and volcano plots

Two-axis continuous plots for exploring enrichment structure:

# Scatter — enrichment funnel plot (MA-plot analogy for enrichment)
#   x = log10(pathway size), y = log2(fold enrichment), color = -log10(p)
#   Small pathways (left) can have extreme FE by chance; large pathways
#   (right) with high FE are the most robust findings
richScatter(resgo, top = 50, label.top = 5)
richScatter(resko, usePadj = FALSE)

# Volcano — enrichment volcano plot
#   x = effect size (log2 FE for ORA, NES for GSEA)
#   y = -log10(p), color = effect (diverging gradient), size = -log10(p)
#   Labels auto-adjusted via ggrepel; works for both ORA and GSEA
richVolcano(resgo)                               # ORA: x = log2(Fold Enrichment)
richVolcano(res_gsea)                            # GSEA: x = NES
richVolcano(resgo, label.top = 10, short = TRUE) # customize labels

# Term similarity scatter — MDS projection of gene-set overlap (Jaccard)
#   Functionally related terms cluster together in 2D space
richTermSim(resko, top = 20)

Gene-level plots

Show which genes drive the enrichment:

# Gene-term dot plot — color by fold change (if provided) or -log10(p)
richGeneDot(resgo, fc = my_foldchanges)
richGeneDot(resgo)                              # no fc: color by -log10(Pvalue)

# Gene-term heatmap — pheatmap-style tile plot with borders
#   Missing gene-term combinations shown as grey; supports fold-change coloring
richGeneHeat(resgo, fc = my_foldchanges)
richGeneHeat(resgo, na.fill = "grey90", border.color = "grey40")

# Gene bar plot — bars per term with gene labels aligned inside
richGeneBar(resgo, top = 10)

GSEA-specific plots

# NES bar plot — up/down color-coded by enrichment direction
richNES(res_gsea, top = 20)

# Running enrichment score curve (classic GSEA plot)
richGSEAcurve(object = hsako, gseaRes = res_gsea)
richGSEAcurve(object = hsako, gseaRes = res_gsea, pathways = "hsa04110")

# ECDF step plot — cumulative rank distribution of gene sets
richECDF(res_gsea)

Network and map plots

# Gene-concept network — bipartite graph of terms and their genes
richNetplot(resko, top = 20)

# Term similarity network — edges weighted by gene-set overlap (kappa)
richNetwork(resgo, top = 20, weightcut = 0.01)

# Combined network map — overlay multiple enrichment results
richNetmap(list(resgo, resko), top = 50)

Cluster and comparison plots

# Kappa-based clustering of enrichment terms
resc <- richCluster(resgo)
richClusterDot(resc)

# Multi-group comparison
resko1 <- richKEGG(gene1, kodata = hsako)
resko2 <- richKEGG(gene2, kodata = hsako)
res_cmp <- compareResult(list(S1 = resko1, S2 = resko2))
richCompareDot(res_cmp)

# Comparison heatmap across multiple analyses
richHeatmap(list(GO = resgo, KEGG = resko), top = 50)

UpSet plot

gene_lists <- list(
  "Treatment A" = sample(unique(hsako$GeneID), 500),
  "Treatment B" = sample(unique(hsako$GeneID), 400),
  "Control"     = sample(unique(hsako$GeneID), 600)
)
richUpset(gene_lists)
richUpset(gene_lists,
          mycol = c("dodgerblue", "goldenrod1", "seagreen3"),
          order.by = "degree", nintersects = 20)

Plot summary table

Function Plot type x-axis y-axis Color Size
richBar Bar Gene count Term -log10(p)
richDot Bubble RichFactor Term -log10(p) Gene count
richLollipop Lollipop RichFactor Term -log10(p)
richCircle Circular bar Gene count Term (polar) -log10(p)
richScatter Funnel/MA log10(Pathway Size) log2(Fold Enrichment) -log10(p)
richVolcano Volcano Effect (log2 FE / NES) -log10(p) Effect -log10(p)
richTermSim MDS scatter Dim 1 Dim 2 -log10(p) Gene count
richNES Bar (GSEA) NES Term Up/Down
richECDF Step NES rank Cumulative fraction
richGeneDot Dot matrix Gene Term FC / -log10(p)
richGeneHeat Heatmap Gene Term FC / presence
richGeneBar Stacked bar RichFactor Term Gene labels
richNetplot Network
richNetwork Network -log10(p) Gene count
richUpset UpSet Intersection Set size Group

Working with results

# Extract result table and detail table
result(resgo)
detail(resgo)

# Get genes for specific terms
getGenes(resgo, term = "GO:0006955")

# dplyr verbs work directly
library(dplyr)
resgo %>% filter(Padj < 0.01) %>% head()
resgo %>% select(Term, Pvalue, Padj)
resgo %>% arrange(Pvalue) %>% head(10)

# Batch enrichment across multiple gene lists
gene_lists <- list(GroupA = gene1, GroupB = gene2)
batch_res <- batchEnrich(gene_lists, annot = hsago)

Function name aliases

All plotting functions use the rich* prefix as the primary name. Old gg* names are kept as backward-compatible aliases:

Primary name Aliases
richBar ggbar
richDot ggdot
richLollipop gglollipop, ggLollipop
richGeneDot gggenedot, ggGeneDot
richGeneHeat gggeneheat, ggGeneHeat
richGeneBar gggenebar, ggGeneBar
richCircle ggcircbar, ggCircBar
richVolcano ggvolcano, ggVolcano
richNES ggNES, ggnes
richScatter ggscatter, ggScatter
richECDF ggecdf, ggECDF
richTermSim ggtermsim, ggTermSim
richNetplot ggnetplot
richNetwork ggnetwork
richNetmap ggnetmap
richUpset ggupset
richHeatmap ggheatmap
richClusterDot ggcluster
richCompareDot comparedot
richGSEAplot ggGSEA
richGSEAcurve plotGSEA

Citation

If you use richR in your research, please cite:

Guo K and Hur J (2020). richR: Enrichment analysis for functional genomics.
R package version 0.1.2. https://github.com/guokai8/richR
DOI: 10.5281/zenodo.3675760

Contact

For questions or bug reports please contact guokai8@gmail.com or open an issue on GitHub.