ServiceBackend
- class hailtop.batch.backend.ServiceBackend(*args, billing_project=None, bucket=None, remote_tmpdir=None, google_project=None, token=None, regions=None, gcs_requester_pays_configuration=None, gcs_bucket_allow_list=None)
Bases:
Backend
[Batch
]Backend that executes batches on Hail’s Batch Service on Google Cloud.
Examples
Create and use a backend that bills to the Hail Batch billing project named “my-billing-account” and stores temporary intermediate files in “gs://my-bucket/temporary-files”.
>>> import hailtop.batch as hb >>> service_backend = hb.ServiceBackend( ... billing_project='my-billing-account', ... remote_tmpdir='gs://my-bucket/temporary-files/' ... ) >>> b = hb.Batch(backend=service_backend) >>> j = b.new_job() >>> j.command('echo hello world!') >>> b.run()
Same as above, but set the billing project and temporary intermediate folders via a configuration file:
cat >my-batch-script.py >>EOF import hailtop.batch as hb b = hb.Batch(backend=ServiceBackend()) j = b.new_job() j.command('echo hello world!') b.run() EOF hailctl config set batch/billing_project my-billing-account hailctl config set batch/remote_tmpdir gs://my-bucket/temporary-files/ python3 my-batch-script.py
Same as above, but also specify the use of the
ServiceBackend
via configuration file:cat >my-batch-script.py >>EOF import hailtop.batch as hb b = hb.Batch() j = b.new_job() j.command('echo hello world!') b.run() EOF hailctl config set batch/billing_project my-billing-account hailctl config set batch/remote_tmpdir gs://my-bucket/temporary-files/ hailctl config set batch/backend service python3 my-batch-script.py
Create a backend which stores temporary intermediate files in “https://my-account.blob.core.windows.net/my-container/tempdir”.
>>> service_backend = hb.ServiceBackend( ... billing_project='my-billing-account', ... remote_tmpdir='https://my-account.blob.core.windows.net/my-container/tempdir' ... )
Require all jobs in all batches in this backend to execute in us-central1:
>>> b = hb.Batch(backend=hb.ServiceBackend(regions=['us-central1']))
Same as above, but using a configuration file:
hailctl config set batch/regions us-central1 python3 my-batch-script.py
Same as above, but using the
HAIL_BATCH_REGIONS
environment variable:export HAIL_BATCH_REGIONS=us-central1 python3 my-batch-script.py
Permit jobs to execute in either us-central1 or us-east1:
>>> b = hb.Batch(backend=hb.ServiceBackend(regions=['us-central1', 'us-east1']))
Same as above, but using a configuration file:
hailctl config set batch/regions us-central1,us-east1
Allow reading or writing to buckets even though they are “cold” storage:
>>> b = hb.Batch( ... backend=hb.ServiceBackend( ... gcs_bucket_allow_list=['cold-bucket', 'cold-bucket2'], ... ), ... )
- Parameters:
billing_project (
Optional
[str
]) – Name of billing project to use.bucket (
Optional
[str
]) – This argument is deprecated. Use remote_tmpdir instead.remote_tmpdir (
Optional
[str
]) – Temporary data will be stored in this cloud storage folder.google_project (
Optional
[str
]) – This argument is deprecated. Use gcs_requester_pays_configuration instead.gcs_requester_pays_configuration (either
str
ortuple
ofstr
andlist
ofstr
, optional) – If a string is provided, configure the Google Cloud Storage file system to bill usage to the project identified by that string. If a tuple is provided, configure the Google Cloud Storage file system to bill usage to the specified project for buckets specified in the list.token (
Optional
[str
]) – The authorization token to pass to the batch client. Should only be set for user delegation purposes.regions (
Optional
[List
[str
]]) – Cloud regions in which jobs may run.ServiceBackend.ANY_REGION
indicates jobs may run in any region. If unspecified orNone
, thebatch/regions
Hail configuration variable is consulted. See examples above. If none of these variables are set, then jobs may run in any region.ServiceBackend.supported_regions()
lists the available regions.gcs_bucket_allow_list (
Optional
[List
[str
]]) – A list of buckets that theServiceBackend
should be permitted to read from or write to, even if their default policy is to use “cold” storage.
Attributes
A special value that indicates a job may run in any region.
Methods
Execute a batch.
Get the supported cloud regions
-
ANY_REGION:
ClassVar
[List
[str
]] = ['any_region'] A special value that indicates a job may run in any region.
- async _async_run(batch, dry_run, verbose, delete_scratch_on_exit, wait=True, open=False, disable_progress_bar=False, callback=None, token=None, **backend_kwargs)
Execute a batch.
Warning
This method should not be called directly. Instead, use
batch.Batch.run()
and passServiceBackend
specific arguments as key-word arguments.- Parameters:
batch (
Batch
) – Batch to execute.dry_run (
bool
) – If True, don’t execute code.verbose (
bool
) – If True, print debugging output.delete_scratch_on_exit (
bool
) – If True, delete temporary directories with intermediate files.wait (
bool
) – If True, wait for the batch to finish executing before returning.open (
bool
) – If True, open the UI page for the batch.disable_progress_bar (
bool
) – If True, disable the progress bar.callback (
Optional
[str
]) – If not None, a URL that will receive at most one POST request after the entire batch completes.token (
Optional
[str
]) – If not None, a string used for idempotency of batch submission.
- Return type:
Optional
[Batch
]
- static supported_regions()
Get the supported cloud regions
Examples
>>> regions = ServiceBackend.supported_regions()
- Returns:
A list of the supported cloud regions