Skip to contents

AnnotateFeatures Annotate features in a Seurat object with additional metadata from databases or a GTF file.

Usage

AnnotateFeatures(
  srt,
  species = "Homo_sapiens",
  IDtype = c("symbol", "ensembl_id", "entrez_id"),
  db = NULL,
  db_update = FALSE,
  db_version = "latest",
  convert_species = TRUE,
  Ensembl_version = 103,
  mirror = NULL,
  gtf = NULL,
  merge_gtf_by = "gene_name",
  columns = c("seqname", "feature", "start", "end", "strand", "gene_id", "gene_name",
    "gene_type"),
  assays = "RNA",
  overwrite = FALSE
)

Arguments

srt

Seurat object to be annotated.

species

Name of the species to be used for annotation. Default is "Homo_sapiens".

IDtype

Type of identifier to use for annotation. Default is "symbol" with options "symbol", "ensembl_id", and "entrez_id".

db

Vector of database names to be used for annotation. Default is NULL.

db_update

Logical value indicating whether to update the database. Default is FALSE.

db_version

Version of the database to use. Default is "latest".

convert_species

A logical value indicating whether to use a species-converted database when the annotation is missing for the specified species. The default value is TRUE.

Ensembl_version

Version of the Ensembl database to use. Default is 103.

mirror

URL of the mirror to use for Ensembl database. Default is NULL.

gtf

Path to the GTF file to be used for annotation. Default is NULL.

merge_gtf_by

Column name to merge the GTF file by. Default is "gene_name".

columns

Vector of column names to be used from the GTF file. Default is "seqname", "feature", "start", "end", "strand", "gene_id", "gene_name", "gene_type".

assays

Character vector of assay names to be annotated. Default is "RNA".

overwrite

Logical value indicating whether to overwrite existing metadata. Default is FALSE.

See also

Examples

data("pancreas_sub")
pancreas_sub <- AnnotateFeatures(pancreas_sub,
  species = "Mus_musculus", IDtype = "symbol",
  db = c("Chromosome", "GeneType", "Enzyme", "TF", "CSPA", "VerSeDa")
)
#> Species: Mus_musculus
#> Preparing database: Chromosome
#> Preparing database: GeneType
#> Preparing database: Enzyme
#> Preparing database: TF
#> Preparing database: CSPA
#> Preparing database: VerSeDa
#> Connect to the Ensembl archives...
#> Using the 103 version of biomart...
#> Connecting to the biomart...
#> Searching the dataset mmusculus ...
#> Connecting to the dataset mmusculus_gene_ensembl ...
#> Converting the geneIDs...
#> Error in collect(., Inf): Failed to collect lazy table.
#> Caused by error in `db_collect()`:
#> ! Arguments in `...` must be used.
#>  Problematic argument:
#>  ..1 = Inf
#>  Did you misspell an argument name?
head(pancreas_sub[["RNA"]]@meta.features)
#>               highly_variable_genes
#> Mrpl15                        False
#> Npbwr1                         <NA>
#> 4732440D04Rik                 False
#> Gm26901                       False
#> Sntg1                          True
#> Mybl1                         False

## Annotate features using a GTF file
# pancreas_sub <- AnnotateFeatures(pancreas_sub, gtf = "/data/reference/CellRanger/refdata-gex-mm10-2020-A/genes/genes.gtf")
# head(pancreas_sub[["RNA"]]@meta.features)