Perform the enrichment analysis (GSEA) on the genes
Usage
RunGSEA(
srt = NULL,
group_by = NULL,
test.use = "wilcox",
DE_threshold = "p_val_adj < 0.05",
scoreType = "std",
geneID = NULL,
geneScore = NULL,
geneID_groups = NULL,
geneID_exclude = NULL,
IDtype = "symbol",
result_IDtype = "symbol",
species = "Homo_sapiens",
db = "GO_BP",
db_update = FALSE,
db_version = "latest",
db_combine = FALSE,
convert_species = TRUE,
Ensembl_version = 103,
mirror = NULL,
TERM2GENE = NULL,
TERM2NAME = NULL,
minGSSize = 10,
maxGSSize = 500,
unlimited_db = c("Chromosome", "GeneType", "TF", "Enzyme", "CSPA"),
GO_simplify = FALSE,
GO_simplify_cutoff = "p.adjust < 0.05",
simplify_method = "Wang",
simplify_similarityCutoff = 0.7,
BPPARAM = BiocParallel::bpparam(),
seed = 11
)
Arguments
- srt
A Seurat object containing the results of differential expression analysis (RunDEtest). If specified, the genes and groups will be extracted from the Seurat object automatically. If not specified, the
geneID
andgeneID_groups
arguments must be provided.- group_by
A character vector specifying the grouping variable in the Seurat object. This argument is only used if
srt
is specified.- test.use
A character vector specifying the test to be used in differential expression analysis. This argument is only used if
srt
is specified.- DE_threshold
A character vector specifying the filter condition for differential expression analysis. This argument is only used if
srt
is specified.- scoreType
This parameter defines the GSEA score type. Possible options are "std", "pos", "neg". By default ("std") the enrichment score is computed as in the original GSEA. The "pos" and "neg" score types are intended to be used for one-tailed tests (i.e. when one is interested only in positive ("pos") or negateive ("neg") enrichment).
- geneID
A character vector specifying the gene IDs.
- geneScore
A numeric vector that specifies the gene scores, for example, the log2(fold change) values of gene expression.
- geneID_groups
A factor vector specifying the group labels for each gene.
- geneID_exclude
A character vector specifying the gene IDs to be excluded from the analysis.
- IDtype
A character vector specifying the type of gene IDs in the
srt
object orgeneID
argument. This argument is used to convert the gene IDs to a different type ifIDtype
is different fromresult_IDtype
.- result_IDtype
A character vector specifying the desired type of gene ID to be used in the output. This argument is used to convert the gene IDs from
IDtype
toresult_IDtype
.- species
A character vector specifying the species for which the analysis is performed.
- db
A character vector specifying the name of the database to be used for enrichment analysis.
- db_update
A logical value indicating whether the gene annotation databases should be forcefully updated. If set to FALSE, the function will attempt to load the cached databases instead. Default is FALSE.
- db_version
A character vector specifying the version of the database to be used. This argument is ignored if
db_update
isTRUE
. Default is "latest".- db_combine
A logical value indicating whether to combine multiple databases into one. If TRUE, all database specified by
db
will be combined as one named "Combined".- convert_species
A logical value indicating whether to use a species-converted database when the annotation is missing for the specified species. The default value is TRUE.
- Ensembl_version
Ensembl database version. If NULL, use the current release version.
- mirror
Specify an Ensembl mirror to connect to. The valid options here are 'www', 'uswest', 'useast', 'asia'.
- TERM2GENE
A data frame specifying the gene-term mapping for a custom database. The first column should contain the term IDs, and the second column should contain the gene IDs.
- TERM2NAME
A data frame specifying the term-name mapping for a custom database. The first column should contain the term IDs, and the second column should contain the corresponding term names.
- minGSSize
A numeric value specifying the minimum size of a gene set to be considered in the enrichment analysis.
- maxGSSize
A numeric value specifying the maximum size of a gene set to be considered in the enrichment analysis.
- unlimited_db
A character vector specifying the names of databases that do not have size restrictions.
- GO_simplify
A logical value indicating whether to simplify the GO terms. If
TRUE
, additional results with simplified GO terms will be returned.- GO_simplify_cutoff
A character vector specifying the filter condition for simplification of GO terms. This argument is only used if
GO_simplify
isTRUE
.- simplify_method
A character vector specifying the method to be used for simplification of GO terms. This argument is only used if
GO_simplify
isTRUE
.- simplify_similarityCutoff
A numeric value specifying the similarity cutoff for simplification of GO terms. This argument is only used if
GO_simplify
isTRUE
.- BPPARAM
A BiocParallelParam object specifying the parallel back-end to be used for parallel computation. Defaults to BiocParallel::bpparam().
- seed
The random seed for reproducibility. Defaults to 11.
Value
If input is a Seurat object, returns the modified Seurat object with the enrichment result stored in the tools slot.
If input is a geneID vector with or without geneID_groups, return the enrichment result directly.
Enrichment result is a list with the following component:
enrichment
: A data.frame containing all enrichment results.results
: A list ofgseaResult
objects from the DOSE package.geneMap
: A data.frame containing the ID mapping table for input gene IDs.input
: A data.frame containing the input gene IDs and gene ID groups.DE_threshold
: A specific threshold for differential expression analysis (only returned if input is a Seurat object).
Examples
data("pancreas_sub")
pancreas_sub <- RunDEtest(pancreas_sub, group_by = "CellType", only.pos = FALSE, fc.threshold = 1)
#> Warning: Data in the 'data' slot is raw counts. Perform NormalizeData(LogNormalize) on the data.
#> [2023-11-21 07:39:33.105402] Start DEtest
#> Workers: 2
#> Find all markers(wilcox) among 5 groups...
#>
|
| | 0%
#>
#>
#>
|
|============================================ | 40%
#>
#>
#>
#>
|
|==============================================================================================================| 100%
#>
#> [2023-11-21 07:42:20.95219] DEtest done
#> Elapsed time:2.8 mins
pancreas_sub <- RunGSEA(pancreas_sub,
group_by = "CellType", DE_threshold = "p_val_adj < 0.05",
scoreType = "std", db = "GO_BP", species = "Mus_musculus"
)
#> [2023-11-21 07:42:20.956203] Start GSEA
#> Workers: 2
#> Species: Mus_musculus
#> Loading cached db: GO_BP version:3.17.0 nterm:16027 created:2023-11-21 07:14:20.545084
#> Convert ID types for the database: GO_BP
#> Connect to the Ensembl archives...
#> Using the 103 version of biomart...
#> Connecting to the biomart...
#> Searching the dataset mmusculus ...
#> Connecting to the dataset mmusculus_gene_ensembl ...
#> Converting the geneIDs...
#> Error in collect(., Inf): Failed to collect lazy table.
#> Caused by error in `db_collect()`:
#> ! Arguments in `...` must be used.
#> ✖ Problematic argument:
#> • ..1 = Inf
#> ℹ Did you misspell an argument name?
GSEAPlot(pancreas_sub, db = "GO_BP", group_by = "CellType", plot_type = "comparison")
#> Error in GSEAPlot(pancreas_sub, db = "GO_BP", group_by = "CellType", plot_type = "comparison"): No enrichment result found. You may perform RunGSEA first.
GSEAPlot(pancreas_sub, db = "GO_BP", group_by = "CellType", group_use = "Ductal", id_use = "GO:0006412")
#> Error in GSEAPlot(pancreas_sub, db = "GO_BP", group_by = "CellType", group_use = "Ductal", id_use = "GO:0006412"): No enrichment result found. You may perform RunGSEA first.
GSEAPlot(pancreas_sub, db = "GO_BP", group_by = "CellType", group_use = "Endocrine", id_use = c("GO:0046903", "GO:0015031", "GO:0007600"))
#> Error in GSEAPlot(pancreas_sub, db = "GO_BP", group_by = "CellType", group_use = "Endocrine", id_use = c("GO:0046903", "GO:0015031", "GO:0007600")): No enrichment result found. You may perform RunGSEA first.
# Remove redundant GO terms
pancreas_sub <- RunGSEA(srt = pancreas_sub, group_by = "CellType", db = "GO_BP", GO_simplify = TRUE, species = "Mus_musculus")
#> [2023-11-21 07:42:25.925353] Start GSEA
#> Workers: 2
#> Species: Mus_musculus
#> Loading cached db: GO_BP version:3.17.0 nterm:16027 created:2023-11-21 07:14:20.545084
#> Convert ID types for the database: GO_BP
#> Connect to the Ensembl archives...
#> Using the 103 version of biomart...
#> Connecting to the biomart...
#> Searching the dataset mmusculus ...
#> Connecting to the dataset mmusculus_gene_ensembl ...
#> Converting the geneIDs...
#> Error in collect(., Inf): Failed to collect lazy table.
#> Caused by error in `db_collect()`:
#> ! Arguments in `...` must be used.
#> ✖ Problematic argument:
#> • ..1 = Inf
#> ℹ Did you misspell an argument name?
GSEAPlot(pancreas_sub, db = "GO_BP_sim", group_by = "CellType", plot_type = "comparison")
#> Error in GSEAPlot(pancreas_sub, db = "GO_BP_sim", group_by = "CellType", plot_type = "comparison"): No enrichment result found. You may perform RunGSEA first.
# Use a combined database
pancreas_sub <- RunGSEA(
srt = pancreas_sub, group_by = "CellType",
db = c("KEGG", "WikiPathway", "Reactome", "PFAM", "MP"),
db_combine = TRUE,
species = "Mus_musculus"
)
#> [2023-11-21 07:42:29.70441] Start GSEA
#> Workers: 2
#> Species: Mus_musculus
#> Loading cached db: KEGG version:Release 108.0+/11-18, Nov 23 nterm:352 created:2023-11-21 07:38:42.039968
#> Loading cached db: WikiPathway version:20231110 nterm:202 created:2023-11-21 07:38:45.644427
#> Loading cached db: Reactome version:1.84.0 nterm:1715 created:2023-11-21 07:38:51.505254
#> Loading cached db: PFAM version:3.17.0 nterm:6565 created:2023-11-21 07:38:52.075182
#> Loading cached db: MP version:2023-11-20 08:08 nterm:10311 created:2023-11-21 07:14:35.012367
#> Convert ID types for the database: KEGG
#> Connect to the Ensembl archives...
#> Using the 103 version of biomart...
#> Connecting to the biomart...
#> Searching the dataset mmusculus ...
#> Connecting to the dataset mmusculus_gene_ensembl ...
#> Converting the geneIDs...
#> Error in collect(., Inf): Failed to collect lazy table.
#> Caused by error in `db_collect()`:
#> ! Arguments in `...` must be used.
#> ✖ Problematic argument:
#> • ..1 = Inf
#> ℹ Did you misspell an argument name?
GSEAPlot(pancreas_sub, db = "Combined", group_by = "CellType", plot_type = "comparison")
#> Error in GSEAPlot(pancreas_sub, db = "Combined", group_by = "CellType", plot_type = "comparison"): No enrichment result found. You may perform RunGSEA first.
# Or use "geneID", "geneScore" and "geneID_groups" as input to run GSEA
de_df <- dplyr::filter(pancreas_sub@tools$DEtest_CellType$AllMarkers_wilcox, p_val_adj < 0.05)
gsea_out <- RunGSEA(geneID = de_df[["gene"]], geneScore = de_df[["avg_log2FC"]], geneID_groups = de_df[["group1"]], db = "GO_BP", species = "Mus_musculus")
#> [2023-11-21 07:42:34.903499] Start GSEA
#> Workers: 2
#> Species: Mus_musculus
#> Loading cached db: GO_BP version:3.17.0 nterm:16027 created:2023-11-21 07:14:20.545084
#> Convert ID types for the database: GO_BP
#> Connect to the Ensembl archives...
#> Using the 103 version of biomart...
#> Connecting to the biomart...
#> Searching the dataset mmusculus ...
#> Connecting to the dataset mmusculus_gene_ensembl ...
#> Converting the geneIDs...
#> Error in collect(., Inf): Failed to collect lazy table.
#> Caused by error in `db_collect()`:
#> ! Arguments in `...` must be used.
#> ✖ Problematic argument:
#> • ..1 = Inf
#> ℹ Did you misspell an argument name?
GSEAPlot(res = gsea_out, db = "GO_BP", plot_type = "comparison")
#> Error in eval(expr, envir, enclos): object 'gsea_out' not found