DB
- class hail.experimental.DB[source]
Bases:
objectAn annotation database instance.
This class facilitates the annotation of genetic datasets with variant annotations. It accepts either an HTTP(S) URL to an Annotation DB configuration or a Python
dictdescribing an Annotation DB configuration. User must specify the region (aws:'us', gcp:'us-central1'or'europe-west1') in which the cluster is running if connecting to the default Hail Annotation DB. User must also specify the cloud platform that they are using ('gcp'or'aws').- Parameters:
region (
str) – Region cluster is running in, either'us','us-central1', or'europe-west1'(default is'us-central1').cloud (
str) – Cloud platform, either'gcp'or'aws'(default is'gcp').url (
str, optional) – Optional URL to annotation DB configuration, if using custom configuration (default isNone).config (
str, optional) – Optionaldictdescribing an annotation DB configuration, if using custom configuration (default isNone).
Note
The
'aws'cloud platform is currently only available for the'us'region.Examples
Create an annotation database connecting to the default Hail Annotation DB:
>>> db = hl.experimental.DB(region='us-central1', cloud='gcp')
Attributes
List of names of available annotation datasets.
Methods
Add annotations from datasets specified by name to a relational object.
- annotate_rows_db(rel, *names)[source]
Add annotations from datasets specified by name to a relational object.
List datasets with
available_datasets.An interactive query builder is available in the Hail Annotation Database documentation.
Examples
Annotate a
MatrixTablewithgnomad_lof_metrics:>>> db = hl.experimental.DB(region='us-central1', cloud='gcp') >>> mt = db.annotate_rows_db(mt, 'gnomad_lof_metrics')
Annotate a
Tablewithclinvar_gene_summary,CADD, andDANN:>>> db = hl.experimental.DB(region='us-central1', cloud='gcp') >>> ht = db.annotate_rows_db(ht, 'clinvar_gene_summary', 'CADD', 'DANN')
Notes
If a dataset is gene-keyed, the annotation will be a dictionary mapping from gene name to the annotation value. There will be one entry for each gene overlapping the given locus.
If a dataset does not have unique rows for each key (consider the
gencodegenes, which may overlap; andclinvar_variant_summary, which contains many overlapping multiple nucleotide variants), then the result will be an array of annotation values, one for each row.- Parameters:
rel (
MatrixTableorTable) – The relational object to which to add annotations.names (varargs of
str) – The names of the datasets with which to annotate rel.
- Returns:
MatrixTableorTable– The relational object rel, with the annotations from names added.