.. _sec-annotationdb: =================== Annotation Database =================== This database contains a curated collection of variant annotations in Hail-friendly format, for use in Hail analysis pipelines. Currently, the :py:meth:`~.VariantDataset.annotate_variants_db` VDS method associated with this database works only if you are running Hail on the Google Cloud Platform. To incorporate these annotations in your own Hail analysis pipeline, select which annotations you would like to query from the documentation_ below and then copy-and-paste the Hail code generated into your own analysis script. For example, a simple Hail script to load a VCF into a VDS, annotate the VDS with CADD raw and PHRED scores using this database, and inspect the schema could look something like this: .. code-block:: python import hail from pprint import pprint hc = hail.HailContext() vds = ( hc .import_vcf('gs://annotationdb/test/sample.vcf') .split_multi() .annotate_variants_db([ 'va.cadd' ]) ) pprint(vds.variant_schema) This code would return the following schema: .. code-block:: text Struct{ rsid: String, qual: Double, filters: Set[String], info: Struct{ ... }, cadd: Struct{ RawScore: Double, PHRED: Double } } -------------- Database Query -------------- Select annotations by clicking on the checkboxes in the documentation_, and the appropriate Hail command will be generated in the panel below. Use the "Copy to clipboard" button to copy the generated Hail code, and paste the command into your own Hail script. .. raw:: html
.. _documentation: ------------- Documentation ------------- These annotations have been collected from a variety of publications and their accompanying datasets (usually text files). Links to the relevant publications and raw data downloads are included where applicable. .. raw:: html