DB

class hail.experimental.DB[source]

Bases: object

An annotation database instance.

This class facilitates the annotation of genetic datasets with variant annotations. It accepts either an HTTP(S) URL to an Annotation DB configuration or a Python dict describing an Annotation DB configuration. User must specify the region (aws: 'us', gcp: 'us-central1' or 'europe-west1') in which the cluster is running if connecting to the default Hail Annotation DB. User must also specify the cloud platform that they are using ('gcp' or 'aws').

Parameters:

region (str) – Region cluster is running in, either 'us', 'us-central1', or 'europe-west1' (default is 'us-central1').
cloud (str) – Cloud platform, either 'gcp' or 'aws' (default is 'gcp').
url (str, optional) – Optional URL to annotation DB configuration, if using custom configuration (default is None).
config (str, optional) – Optional dict describing an annotation DB configuration, if using custom configuration (default is None).

Note

The 'aws' cloud platform is currently only available for the 'us' region.

Examples

Create an annotation database connecting to the default Hail Annotation DB:

>>> db = hl.experimental.DB(region='us-central1', cloud='gcp')

Attributes

available_datasets

List of names of available annotation datasets.

Methods

annotate_rows_db

Add annotations from datasets specified by name to a relational object.

annotate_rows_db(rel, *names)[source]

Add annotations from datasets specified by name to a relational object.

List datasets with available_datasets.

An interactive query builder is available in the Hail Annotation Database documentation.

Examples

Annotate a MatrixTable with gnomad_lof_metrics:

>>> db = hl.experimental.DB(region='us-central1', cloud='gcp')
>>> mt = db.annotate_rows_db(mt, 'gnomad_lof_metrics') 

Annotate a Table with clinvar_gene_summary, CADD, and DANN:

>>> db = hl.experimental.DB(region='us-central1', cloud='gcp')
>>> ht = db.annotate_rows_db(ht, 'clinvar_gene_summary', 'CADD', 'DANN') 

Notes

If a dataset is gene-keyed, the annotation will be a dictionary mapping from gene name to the annotation value. There will be one entry for each gene overlapping the given locus.

If a dataset does not have unique rows for each key (consider the gencode genes, which may overlap; and clinvar_variant_summary, which contains many overlapping multiple nucleotide variants), then the result will be an array of annotation values, one for each row.

Parameters:

rel (MatrixTable or Table) – The relational object to which to add annotations.
names (varargs of str) – The names of the datasets with which to annotate rel.

Returns:

MatrixTable or Table – The relational object rel, with the annotations from names added.

property available_datasets

List of names of available annotation datasets.

Returns:: list – List of available annotation datasets.