Hail Query-on-Batch
Warning
Hail Query-on-Batch (the Batch backend) is currently in beta. This means some functionality is not yet working. Please contact us if you would like to use missing functionality on Query-on-Batch!
Hail Query-on-Batch uses Hail Batch instead of Apache Spark to execute jobs. Instead of a Dataproc cluster, you will need a Hail Batch cluster. For more information on using Hail Batch, see the Hail Batch docs. For more information on deploying a Hail Batch cluster, please contact the Hail Team at our discussion forum.
Getting Started
Install Hail version 0.2.93 or later:
pip install 'hail>=0.2.93'
Sign up for a Hail Batch account (currently only available to Broad affiliates).
Authenticate with Hail Batch.
hailctl auth login
Specify a bucket for Hail to use for temporary intermediate files. In Google Cloud, we recommend using a bucket with automatic deletion after a set period of time.
hailctl config set batch/remote_tmpdir gs://my-auto-delete-bucket/hail-query-temporaries
Specify a Hail Batch billing project (these are different from Google Cloud projects). Every new user has a trial billing project loaded with 10 USD. The name is available on the Hail User account page.
hailctl config set batch/billing_project my-billing-project
Set the default Hail Query backend to
batch
:
hailctl config set query/backend batch
Now you are ready to try Hail! If you want to switch back to Query-on-Spark, run the previous command again with “spark” in place of “batch”.
Variant Effect Predictor (VEP)
More information coming very soon. If you want to use VEP with Hail Query-on-Batch, please contact the Hail Team at our discussion forum.