DB
- class hail.experimental.DB[source]
An annotation database instance.
This class facilitates the annotation of genetic datasets with variant annotations. It accepts either an HTTP(S) URL to an Annotation DB configuration or a Python
dict
describing an Annotation DB configuration. User must specify the region (aws:'us'
, gcp:'us-central1'
or'europe-west1'
) in which the cluster is running if connecting to the default Hail Annotation DB. User must also specify the cloud platform that they are using ('gcp'
or'aws'
).- Parameters:
region (
str
) – Region cluster is running in, either'us'
,'us-central1'
, or'europe-west1'
(default is'us-central1'
).cloud (
str
) – Cloud platform, either'gcp'
or'aws'
(default is'gcp'
).url (
str
, optional) – Optional URL to annotation DB configuration, if using custom configuration (default isNone
).config (
str
, optional) – Optionaldict
describing an annotation DB configuration, if using custom configuration (default isNone
).
Note
The
'aws'
cloud platform is currently only available for the'us'
region.Examples
Create an annotation database connecting to the default Hail Annotation DB:
>>> db = hl.experimental.DB(region='us-central1', cloud='gcp')
Attributes
List of names of available annotation datasets.
Methods
Add annotations from datasets specified by name to a relational object.
- annotate_rows_db(rel, *names)[source]
Add annotations from datasets specified by name to a relational object.
List datasets with
available_datasets
.An interactive query builder is available in the Hail Annotation Database documentation.
Examples
Annotate a
MatrixTable
withgnomad_lof_metrics
:>>> db = hl.experimental.DB(region='us-central1', cloud='gcp') >>> mt = db.annotate_rows_db(mt, 'gnomad_lof_metrics')
Annotate a
Table
withclinvar_gene_summary
,CADD
, andDANN
:>>> db = hl.experimental.DB(region='us-central1', cloud='gcp') >>> ht = db.annotate_rows_db(ht, 'clinvar_gene_summary', 'CADD', 'DANN')
Notes
If a dataset is gene-keyed, the annotation will be a dictionary mapping from gene name to the annotation value. There will be one entry for each gene overlapping the given locus.
If a dataset does not have unique rows for each key (consider the
gencode
genes, which may overlap; andclinvar_variant_summary
, which contains many overlapping multiple nucleotide variants), then the result will be an array of annotation values, one for each row.- Parameters:
rel (
MatrixTable
orTable
) – The relational object to which to add annotations.names (varargs of
str
) – The names of the datasets with which to annotate rel.
- Returns:
MatrixTable
orTable
– The relational object rel, with the annotations from names added.