ServiceBackend

class hailtop.batch.backend.ServiceBackend(*args, billing_project=None, bucket=None, remote_tmpdir=None, google_project=None, token=None, regions=None)

Bases: Backend[Batch]

Attributes

ANY_REGION

Backend that executes batches on Hail's Batch Service on Google Cloud.

Methods

_run

Execute a batch.

supported_regions

Get the supported cloud regions

validate_file_scheme

rtype:

None

ANY_REGION = ['any_region']

Backend that executes batches on Hail’s Batch Service on Google Cloud.

Examples

>>> service_backend = ServiceBackend(billing_project='my-billing-account', remote_tmpdir='gs://my-bucket/temporary-files/') 
>>> b = Batch(backend=service_backend) 
>>> b.run() 
>>> service_backend.close() 

If the Hail configuration parameters batch/billing_project and batch/remote_tmpdir were previously set with hailctl config set, then one may elide the billing_project and remote_tmpdir parameters.

>>> service_backend = ServiceBackend()
>>> b = Batch(backend=service_backend)
>>> b.run() 
>>> service_backend.close()
Parameters:
  • billing_project – Name of billing project to use.

  • bucket – Name of bucket to use. Should not include the gs:// prefix. Cannot be used with remote_tmpdir. Temporary data will be stored in the “/batch” folder of this bucket. This argument is deprecated. Use remote_tmpdir instead.

  • remote_tmpdir – Temporary data will be stored in this cloud storage folder. Cannot be used with deprecated argument bucket. Paths should match a GCS URI like gs://<BUCKET_NAME>/<PATH> or an ABS URI of the form https://<ACCOUNT_NAME>.blob.core.windows.net/<CONTAINER_NAME>/<PATH>.

  • google_project – If specified, the project to use when authenticating with Google Storage. Google Storage is used to transfer serialized values between this computer and the cloud machines that execute Python jobs.

  • token – The authorization token to pass to the batch client. Should only be set for user delegation purposes.

  • regions – Cloud region(s) to run jobs in. Use py:staticmethod:.ServiceBackend.supported_regions to list the available regions to choose from. Use py:attribute:.ServiceBackend.ANY_REGION to signify the default is jobs can run in any available region. The default is jobs can run in any region unless a default value has been set with hailctl. An example invocation is hailctl config set batch/regions “us-central1,us-east1”.

_run(batch, dry_run, verbose, delete_scratch_on_exit, wait=True, open=False, disable_progress_bar=False, callback=None, token=None, **backend_kwargs)

Execute a batch.

Warning

This method should not be called directly. Instead, use batch.Batch.run() and pass ServiceBackend specific arguments as key-word arguments.

Parameters:
  • batch (Batch) – Batch to execute.

  • dry_run (bool) – If True, don’t execute code.

  • verbose (bool) – If True, print debugging output.

  • delete_scratch_on_exit (bool) – If True, delete temporary directories with intermediate files.

  • wait (bool) – If True, wait for the batch to finish executing before returning.

  • open (bool) – If True, open the UI page for the batch.

  • disable_progress_bar (bool) – If True, disable the progress bar.

  • callback (Optional[str]) – If not None, a URL that will receive at most one POST request after the entire batch completes.

  • token (Optional[str]) – If not None, a string used for idempotency of batch submission.

Return type:

Batch

static supported_regions()

Get the supported cloud regions

Examples

>>> regions = ServiceBackend.supported_regions()
Returns:

A list of the supported cloud regions

validate_file_scheme(uri)
Return type:

None