DB
- class hail.experimental.DB[source]
An annotation database instance.
This class facilitates the annotation of genetic datasets with variant annotations. It accepts either an HTTP(S) URL to an Annotation DB configuration or a Python
dict
describing an Annotation DB configuration. User must specify the region ('us'
or'eu'
) in which the cluster is running if connecting to the default Hail Annotation DB. User must also specify the cloud platform that they are using ('gcp'
or'aws'
).- Parameters:
region (
str
) – Region cluster is running in, either'us'
or'eu'
(default is'us'
).cloud (
str
) – Cloud platform, either'gcp'
or'aws'
(default is'gcp'
).url (
str
, optional) – Optional URL to annotation DB configuration, if using custom configuration (default isNone
).config (
str
, optional) – Optionaldict
describing an annotation DB configuration, if using custom configuration (default isNone
).
Note
The
'aws'
cloud platform is currently only available for the'us'
region. If region is'eu'
, cloud must be set to'gcp'
.Examples
Create an annotation database connecting to the default Hail Annotation DB:
>>> db = hl.experimental.DB(region='us', cloud='gcp')
Attributes
List of names of available annotation datasets.
Methods
Add annotations from datasets specified by name to a relational object.
- annotate_rows_db(rel, *names)[source]
Add annotations from datasets specified by name to a relational object.
List datasets with
available_datasets
.An interactive query builder is available in the Hail Annotation Database documentation.
Examples
Annotate a
MatrixTable
withgnomad_lof_metrics
:>>> db = hl.experimental.DB(region='us', cloud='gcp') >>> mt = db.annotate_rows_db(mt, 'gnomad_lof_metrics')
Annotate a
Table
withclinvar_gene_summary
,CADD
, andDANN
:>>> db = hl.experimental.DB(region='us', cloud='gcp') >>> ht = db.annotate_rows_db(ht, 'clinvar_gene_summary', 'CADD', 'DANN')
Notes
If a dataset is gene-keyed, the annotation will be a dictionary mapping from gene name to the annotation value. There will be one entry for each gene overlapping the given locus.
If a dataset does not have unique rows for each key (consider the
gencode
genes, which may overlap; andclinvar_variant_summary
, which contains many overlapping multiple nucleotide variants), then the result will be an array of annotation values, one for each row.- Parameters:
rel (
MatrixTable
orTable
) – The relational object to which to add annotations.names (varargs of
str
) – The names of the datasets with which to annotate rel.
- Returns:
MatrixTable
orTable
– The relational object rel, with the annotations from names added.