Perform the enrichment analysis (over-representation) on the genes
Source:R/SCP-analysis.R
RunEnrichment.Rd
Perform the enrichment analysis (over-representation) on the genes
Usage
RunEnrichment(
srt = NULL,
group_by = NULL,
test.use = "wilcox",
DE_threshold = "avg_log2FC > 0 & p_val_adj < 0.05",
geneID = NULL,
geneID_groups = NULL,
geneID_exclude = NULL,
IDtype = "symbol",
result_IDtype = "symbol",
species = "Homo_sapiens",
db = "GO_BP",
db_update = FALSE,
db_version = "latest",
db_combine = FALSE,
convert_species = TRUE,
Ensembl_version = 103,
mirror = NULL,
TERM2GENE = NULL,
TERM2NAME = NULL,
minGSSize = 10,
maxGSSize = 500,
unlimited_db = c("Chromosome", "GeneType", "TF", "Enzyme", "CSPA"),
GO_simplify = FALSE,
GO_simplify_cutoff = "p.adjust < 0.05",
simplify_method = "Wang",
simplify_similarityCutoff = 0.7,
BPPARAM = BiocParallel::bpparam(),
seed = 11
)
Arguments
- srt
A Seurat object containing the results of differential expression analysis (RunDEtest). If specified, the genes and groups will be extracted from the Seurat object automatically. If not specified, the
geneID
andgeneID_groups
arguments must be provided.- group_by
A character vector specifying the grouping variable in the Seurat object. This argument is only used if
srt
is specified.- test.use
A character vector specifying the test to be used in differential expression analysis. This argument is only used if
srt
is specified.- DE_threshold
A character vector specifying the filter condition for differential expression analysis. This argument is only used if
srt
is specified.- geneID
A character vector specifying the gene IDs.
- geneID_groups
A factor vector specifying the group labels for each gene.
- geneID_exclude
A character vector specifying the gene IDs to be excluded from the analysis.
- IDtype
A character vector specifying the type of gene IDs in the
srt
object orgeneID
argument. This argument is used to convert the gene IDs to a different type ifIDtype
is different fromresult_IDtype
.- result_IDtype
A character vector specifying the desired type of gene ID to be used in the output. This argument is used to convert the gene IDs from
IDtype
toresult_IDtype
.- species
A character vector specifying the species for which the analysis is performed.
- db
A character vector specifying the name of the database to be used for enrichment analysis.
- db_update
A logical value indicating whether the gene annotation databases should be forcefully updated. If set to FALSE, the function will attempt to load the cached databases instead. Default is FALSE.
- db_version
A character vector specifying the version of the database to be used. This argument is ignored if
db_update
isTRUE
. Default is "latest".- db_combine
A logical value indicating whether to combine multiple databases into one. If TRUE, all database specified by
db
will be combined as one named "Combined".- convert_species
A logical value indicating whether to use a species-converted database when the annotation is missing for the specified species. The default value is TRUE.
- Ensembl_version
Ensembl database version. If NULL, use the current release version.
- mirror
Specify an Ensembl mirror to connect to. The valid options here are 'www', 'uswest', 'useast', 'asia'.
- TERM2GENE
A data frame specifying the gene-term mapping for a custom database. The first column should contain the term IDs, and the second column should contain the gene IDs.
- TERM2NAME
A data frame specifying the term-name mapping for a custom database. The first column should contain the term IDs, and the second column should contain the corresponding term names.
- minGSSize
A numeric value specifying the minimum size of a gene set to be considered in the enrichment analysis.
- maxGSSize
A numeric value specifying the maximum size of a gene set to be considered in the enrichment analysis.
- unlimited_db
A character vector specifying the names of databases that do not have size restrictions.
- GO_simplify
A logical value indicating whether to simplify the GO terms. If
TRUE
, additional results with simplified GO terms will be returned.- GO_simplify_cutoff
A character vector specifying the filter condition for simplification of GO terms. This argument is only used if
GO_simplify
isTRUE
.- simplify_method
A character vector specifying the method to be used for simplification of GO terms. This argument is only used if
GO_simplify
isTRUE
.- simplify_similarityCutoff
A numeric value specifying the similarity cutoff for simplification of GO terms. This argument is only used if
GO_simplify
isTRUE
.- BPPARAM
A BiocParallelParam object specifying the parallel back-end to be used for parallel computation. Defaults to BiocParallel::bpparam().
- seed
The random seed for reproducibility. Defaults to 11.
Value
If input is a Seurat object, returns the modified Seurat object with the enrichment result stored in the tools slot.
If input is a geneID vector with or without geneID_groups, return the enrichment result directly.
Enrichment result is a list with the following component:
enrichment
: A data.frame containing all enrichment results.results
: A list ofenrichResult
objects from the DOSE package.geneMap
: A data.frame containing the ID mapping table for input gene IDs.input
: A data.frame containing the input gene IDs and gene ID groups.DE_threshold
: A specific threshold for differential expression analysis (only returned if input is a Seurat object).
Examples
data("pancreas_sub")
pancreas_sub <- RunDEtest(pancreas_sub, group_by = "CellType")
#> Warning: Data in the 'data' slot is raw counts. Perform NormalizeData(LogNormalize) on the data.
#> [2023-11-21 07:37:53.015774] Start DEtest
#> Workers: 2
#> Find all markers(wilcox) among 5 groups...
#>
|
| | 0%
#>
#>
#>
|
|============================================ | 40%
#>
#>
#>
#>
|
|==============================================================================================================| 100%
#>
#> [2023-11-21 07:38:03.458278] DEtest done
#> Elapsed time:10.44 secs
pancreas_sub <- RunEnrichment(
srt = pancreas_sub, group_by = "CellType", DE_threshold = "p_val_adj < 0.05",
db = "GO_BP", species = "Mus_musculus"
)
#> [2023-11-21 07:38:03.46236] Start Enrichment
#> Workers: 2
#> Species: Mus_musculus
#> Loading cached db: GO_BP version:3.17.0 nterm:16027 created:2023-11-21 07:14:20.545084
#> Convert ID types for the database: GO_BP
#> Connect to the Ensembl archives...
#> Using the 103 version of biomart...
#> Connecting to the biomart...
#> Searching the dataset mmusculus ...
#> Connecting to the dataset mmusculus_gene_ensembl ...
#> Converting the geneIDs...
#> Error in collect(., Inf): Failed to collect lazy table.
#> Caused by error in `db_collect()`:
#> ! Arguments in `...` must be used.
#> ✖ Problematic argument:
#> • ..1 = Inf
#> ℹ Did you misspell an argument name?
EnrichmentPlot(pancreas_sub, db = "GO_BP", group_by = "CellType", plot_type = "comparison")
#> Error in EnrichmentPlot(pancreas_sub, db = "GO_BP", group_by = "CellType", plot_type = "comparison"): No enrichment result found. You may perform RunEnrichment first.
pancreas_sub <- RunEnrichment(
srt = pancreas_sub, group_by = "CellType", DE_threshold = "p_val_adj < 0.05",
db = c("MSigDB", "MSigDB_MH"), species = "Mus_musculus"
)
#> [2023-11-21 07:38:07.637929] Start Enrichment
#> Workers: 2
#> Species: Mus_musculus
#> Preparing database: MSigDB
#> Convert ID types for the database: MSigDB
#> Connect to the Ensembl archives...
#> Using the 103 version of biomart...
#> Connecting to the biomart...
#> Searching the dataset mmusculus ...
#> Connecting to the dataset mmusculus_gene_ensembl ...
#> Converting the geneIDs...
#> Error in collect(., Inf): Failed to collect lazy table.
#> Caused by error in `db_collect()`:
#> ! Arguments in `...` must be used.
#> ✖ Problematic argument:
#> • ..1 = Inf
#> ℹ Did you misspell an argument name?
EnrichmentPlot(pancreas_sub, db = "MSigDB", group_by = "CellType", plot_type = "comparison")
#> Error in EnrichmentPlot(pancreas_sub, db = "MSigDB", group_by = "CellType", plot_type = "comparison"): No enrichment result found. You may perform RunEnrichment first.
EnrichmentPlot(pancreas_sub, db = "MSigDB_MH", group_by = "CellType", plot_type = "comparison")
#> Error in EnrichmentPlot(pancreas_sub, db = "MSigDB_MH", group_by = "CellType", plot_type = "comparison"): No enrichment result found. You may perform RunEnrichment first.
# Remove redundant GO terms
pancreas_sub <- RunEnrichment(srt = pancreas_sub, group_by = "CellType", db = "GO_BP", GO_simplify = TRUE, species = "Mus_musculus")
#> [2023-11-21 07:38:22.773752] Start Enrichment
#> Workers: 2
#> Species: Mus_musculus
#> Loading cached db: GO_BP version:3.17.0 nterm:16027 created:2023-11-21 07:14:20.545084
#> Convert ID types for the database: GO_BP
#> Connect to the Ensembl archives...
#> Using the 103 version of biomart...
#> Connecting to the biomart...
#> Searching the dataset mmusculus ...
#> Connecting to the dataset mmusculus_gene_ensembl ...
#> Converting the geneIDs...
#> Error in collect(., Inf): Failed to collect lazy table.
#> Caused by error in `db_collect()`:
#> ! Arguments in `...` must be used.
#> ✖ Problematic argument:
#> • ..1 = Inf
#> ℹ Did you misspell an argument name?
EnrichmentPlot(pancreas_sub, db = "GO_BP_sim", group_by = "CellType", plot_type = "comparison")
#> Error in EnrichmentPlot(pancreas_sub, db = "GO_BP_sim", group_by = "CellType", plot_type = "comparison"): No enrichment result found. You may perform RunEnrichment first.
# Use a combined database
pancreas_sub <- RunEnrichment(
srt = pancreas_sub, group_by = "CellType",
db = c("KEGG", "WikiPathway", "Reactome", "PFAM", "MP"),
db_combine = TRUE,
species = "Mus_musculus"
)
#> [2023-11-21 07:38:26.610798] Start Enrichment
#> Workers: 2
#> Species: Mus_musculus
#> Loading cached db: MP version:2023-11-20 08:08 nterm:10311 created:2023-11-21 07:14:35.012367
#> Preparing database: KEGG
#> Preparing database: WikiPathway
#> Preparing database: Reactome
#> Preparing database: PFAM
#> Convert ID types for the database: KEGG
#> Connect to the Ensembl archives...
#> Using the 103 version of biomart...
#> Connecting to the biomart...
#> Searching the dataset mmusculus ...
#> Connecting to the dataset mmusculus_gene_ensembl ...
#> Converting the geneIDs...
#> Error in collect(., Inf): Failed to collect lazy table.
#> Caused by error in `db_collect()`:
#> ! Arguments in `...` must be used.
#> ✖ Problematic argument:
#> • ..1 = Inf
#> ℹ Did you misspell an argument name?
EnrichmentPlot(pancreas_sub, db = "Combined", group_by = "CellType", plot_type = "comparison")
#> Error in EnrichmentPlot(pancreas_sub, db = "Combined", group_by = "CellType", plot_type = "comparison"): No enrichment result found. You may perform RunEnrichment first.
# Or use "geneID" and "geneID_groups" as input to run enrichment
de_df <- dplyr::filter(pancreas_sub@tools$DEtest_CellType$AllMarkers_wilcox, p_val_adj < 0.05)
enrich_out <- RunEnrichment(geneID = de_df[["gene"]], geneID_groups = de_df[["group1"]], db = "GO_BP", species = "Mus_musculus")
#> [2023-11-21 07:38:54.922648] Start Enrichment
#> Workers: 2
#> Species: Mus_musculus
#> Loading cached db: GO_BP version:3.17.0 nterm:16027 created:2023-11-21 07:14:20.545084
#> Convert ID types for the database: GO_BP
#> Connect to the Ensembl archives...
#> Using the 103 version of biomart...
#> Connecting to the biomart...
#> Searching the dataset mmusculus ...
#> Connecting to the dataset mmusculus_gene_ensembl ...
#> Converting the geneIDs...
#> Error in collect(., Inf): Failed to collect lazy table.
#> Caused by error in `db_collect()`:
#> ! Arguments in `...` must be used.
#> ✖ Problematic argument:
#> • ..1 = Inf
#> ℹ Did you misspell an argument name?
EnrichmentPlot(res = enrich_out, db = "GO_BP", plot_type = "comparison")
#> Error in eval(expr, envir, enclos): object 'enrich_out' not found