ServiceBackend

class hailtop.batch.backend.ServiceBackend(*args, billing_project=None, bucket=None, remote_tmpdir=None, google_project=None, token=None, regions=None, gcs_requester_pays_configuration=None, gcs_bucket_allow_list=None)

Bases: Backend[Batch]

Backend that executes batches on Hail’s Batch Service on Google Cloud.

Examples

Create and use a backend that bills to the Hail Batch billing project named “my-billing-account” and stores temporary intermediate files in “gs://my-bucket/temporary-files”.

>>> import hailtop.batch as hb
>>> service_backend = hb.ServiceBackend(
...     billing_project='my-billing-account',
...     remote_tmpdir='gs://my-bucket/temporary-files/'
... )  
>>> b = hb.Batch(backend=service_backend)  
>>> j = b.new_job()  
>>> j.command('echo hello world!')  
>>> b.run() 

Same as above, but set the billing project and temporary intermediate folders via a configuration file:

cat >my-batch-script.py >>EOF
import hailtop.batch as hb
b = hb.Batch(backend=ServiceBackend())
j = b.new_job()
j.command('echo hello world!')
b.run()
EOF
hailctl config set batch/billing_project my-billing-account
hailctl config set batch/remote_tmpdir gs://my-bucket/temporary-files/
python3 my-batch-script.py

Same as above, but also specify the use of the ServiceBackend via configuration file:

cat >my-batch-script.py >>EOF
import hailtop.batch as hb
b = hb.Batch()
j = b.new_job()
j.command('echo hello world!')
b.run()
EOF
hailctl config set batch/billing_project my-billing-account
hailctl config set batch/remote_tmpdir gs://my-bucket/temporary-files/
hailctl config set batch/backend service
python3 my-batch-script.py

Create a backend which stores temporary intermediate files in “https://my-account.blob.core.windows.net/my-container/tempdir”.

>>> service_backend = hb.ServiceBackend(
...     billing_project='my-billing-account',
...     remote_tmpdir='https://my-account.blob.core.windows.net/my-container/tempdir'
... )  

Require all jobs in all batches in this backend to execute in us-central1:

>>> b = hb.Batch(backend=hb.ServiceBackend(regions=['us-central1']))

Same as above, but using a configuration file:

hailctl config set batch/regions us-central1
python3 my-batch-script.py

Same as above, but using the HAIL_BATCH_REGIONS environment variable:

export HAIL_BATCH_REGIONS=us-central1
python3 my-batch-script.py

Permit jobs to execute in either us-central1 or us-east1:

>>> b = hb.Batch(backend=hb.ServiceBackend(regions=['us-central1', 'us-east1']))

Same as above, but using a configuration file:

hailctl config set batch/regions us-central1,us-east1

Allow reading or writing to buckets even though they are “cold” storage:

>>> b = hb.Batch(
...     backend=hb.ServiceBackend(
...         gcs_bucket_allow_list=['cold-bucket', 'cold-bucket2'],
...     ),
... )
Parameters:
  • billing_project (Optional[str]) – Name of billing project to use.

  • bucket (Optional[str]) – This argument is deprecated. Use remote_tmpdir instead.

  • remote_tmpdir (Optional[str]) – Temporary data will be stored in this cloud storage folder.

  • google_project (Optional[str]) – This argument is deprecated. Use gcs_requester_pays_configuration instead.

  • gcs_requester_pays_configuration (either str or tuple of str and list of str, optional) – If a string is provided, configure the Google Cloud Storage file system to bill usage to the project identified by that string. If a tuple is provided, configure the Google Cloud Storage file system to bill usage to the specified project for buckets specified in the list.

  • token (Optional[str]) – The authorization token to pass to the batch client. Should only be set for user delegation purposes.

  • regions (Optional[List[str]]) – Cloud regions in which jobs may run. ServiceBackend.ANY_REGION indicates jobs may run in any region. If unspecified or None, the batch/regions Hail configuration variable is consulted. See examples above. If none of these variables are set, then jobs may run in any region. ServiceBackend.supported_regions() lists the available regions.

  • gcs_bucket_allow_list (Optional[List[str]]) – A list of buckets that the ServiceBackend should be permitted to read from or write to, even if their default policy is to use “cold” storage.

Attributes

ANY_REGION

A special value that indicates a job may run in any region.

Methods

_async_run

Execute a batch.

supported_regions

Get the supported cloud regions

ANY_REGION: ClassVar[List[str]] = ['any_region']

A special value that indicates a job may run in any region.

async _async_run(batch, dry_run, verbose, delete_scratch_on_exit, wait=True, open=False, disable_progress_bar=False, callback=None, token=None, **backend_kwargs)

Execute a batch.

Warning

This method should not be called directly. Instead, use batch.Batch.run() and pass ServiceBackend specific arguments as key-word arguments.

Parameters:
  • batch (Batch) – Batch to execute.

  • dry_run (bool) – If True, don’t execute code.

  • verbose (bool) – If True, print debugging output.

  • delete_scratch_on_exit (bool) – If True, delete temporary directories with intermediate files.

  • wait (bool) – If True, wait for the batch to finish executing before returning.

  • open (bool) – If True, open the UI page for the batch.

  • disable_progress_bar (bool) – If True, disable the progress bar.

  • callback (Optional[str]) – If not None, a URL that will receive at most one POST request after the entire batch completes.

  • token (Optional[str]) – If not None, a string used for idempotency of batch submission.

Return type:

Optional[Batch]

static supported_regions()

Get the supported cloud regions

Examples

>>> regions = ServiceBackend.supported_regions()
Returns:

A list of the supported cloud regions