Job¶
-
class
hailtop.batch.job.
Job
(batch, token, *, name=None, attributes=None, shell=None)¶ Bases:
object
Object representing a single job to execute.
Notes
This class should never be created directly by the user. Use
Batch.new_job()
,Batch.new_bash_job()
, orBatch.new_python_job()
instead.Methods
Set the job to always run, even if dependencies fail.
Add a bucket to mount with gcsfuse in GCP or a storage container with blobfuse in Azure.
Set the job’s CPU requirements.
Explicitly set dependencies on other jobs.
Add a bucket to mount with gcsfuse.
Set the job’s memory requirements.
Set the job’s storage size.
Set the maximum amount of time this job can run for in seconds.
-
always_run
(always_run=True)¶ Set the job to always run, even if dependencies fail.
Notes
Can only be used with the
backend.ServiceBackend
.Warning
Jobs set to always run are not cancellable!
Examples
>>> b = Batch(backend=backend.ServiceBackend('test')) >>> j = b.new_job() >>> (j.always_run() ... .command(f'echo "hello"'))
-
cloudfuse
(bucket, mount_point, *, read_only=True)¶ Add a bucket to mount with gcsfuse in GCP or a storage container with blobfuse in Azure.
Notes
Can only be used with the
backend.ServiceBackend
. This method can be called more than once.Examples
Google Cloud Platform:
>>> b = Batch(backend=backend.ServiceBackend('test')) >>> j = b.new_job() >>> (j.cloudfuse('my-bucket', '/my-bucket') ... .command(f'cat /my-bucket/my-blob-object'))
Azure:
>>> b = Batch(backend=backend.ServiceBackend('test')) >>> j = b.new_job() >>> (j.cloudfuse('my-account/my-container', '/dest') ... .command(f'cat /dest/my-blob-object'))
- Parameters
bucket (
str
) – Name of the google storage bucket to mount or the path to an Azure container in the format of <account>/<container>.mount_point (
str
) – The path at which the cloud blob storage should be mounted to in the Docker container.read_only (
bool
) – IfTrue
, mount the cloud blob storage in read-only mode.
- Returns
Same job object set with a cloud storage path to mount with either gcsfuse or blobfuse.
-
cpu
(cores)¶ Set the job’s CPU requirements.
Notes
The string expression must be of the form {number}{suffix} where the optional suffix is m representing millicpu. Omitting a suffix means the value is in cpu.
For the
ServiceBackend
, cores must be a power of two between 0.25 and 16.Examples
Set the job’s CPU requirement to 250 millicpu:
>>> b = Batch() >>> j = b.new_job() >>> (j.cpu('250m') ... .command(f'echo "hello"')) >>> b.run()
-
depends_on
(*jobs)¶ Explicitly set dependencies on other jobs.
Examples
Initialize the batch:
>>> b = Batch()
Create the first job:
>>> j1 = b.new_job() >>> j1.command(f'echo "hello"')
Create the second job j2 that depends on j1:
>>> j2 = b.new_job() >>> j2.depends_on(j1) >>> j2.command(f'echo "world"')
Execute the batch:
>>> b.run()
Notes
Dependencies between jobs are automatically created when resources from one job are used in a subsequent job. This method is only needed when no intermediate resource exists and the dependency needs to be explicitly set.
-
env
(variable, value)¶
-
gcsfuse
(bucket, mount_point, read_only=True)¶ Add a bucket to mount with gcsfuse.
Notes
Can only be used with the
backend.ServiceBackend
. This method can be called more than once. This method has been deprecated. UseJob.cloudfuse()
instead.Warning
There are performance and cost implications of using gcsfuse.
Examples
>>> b = Batch(backend=backend.ServiceBackend('test')) >>> j = b.new_job() >>> (j.gcsfuse('my-bucket', '/my-bucket') ... .command(f'cat /my-bucket/my-file'))
- Parameters
bucket – Name of the google storage bucket to mount.
mount_point – The path at which the bucket should be mounted to in the Docker container.
read_only – If
True
, mount the bucket in read-only mode.
- Returns
Same job object set with a bucket to mount with gcsfuse.
-
memory
(memory)¶ Set the job’s memory requirements.
Examples
Set the job’s memory requirement to be 3Gi:
>>> b = Batch() >>> j = b.new_job() >>> (j.memory('3Gi') ... .command(f'echo "hello"')) >>> b.run()
Notes
The memory expression must be of the form {number}{suffix} where valid optional suffixes are K, Ki, M, Mi, G, Gi, T, Ti, P, and Pi. Omitting a suffix means the value is in bytes.
For the
ServiceBackend
, the values ‘lowmem’, ‘standard’, and ‘highmem’ are also valid arguments. ‘lowmem’ corresponds to approximately 1 Gi/core, ‘standard’ corresponds to approximately 4 Gi/core, and ‘highmem’ corresponds to approximately 7 Gi/core. The default value is ‘standard’.
-
storage
(storage)¶ Set the job’s storage size.
Examples
Set the job’s disk requirements to 10 Gi:
>>> b = Batch() >>> j = b.new_job() >>> (j.storage('10Gi') ... .command(f'echo "hello"')) >>> b.run()
Notes
The storage expression must be of the form {number}{suffix} where valid optional suffixes are K, Ki, M, Mi, G, Gi, T, Ti, P, and Pi. Omitting a suffix means the value is in bytes.
For the
ServiceBackend
, jobs requesting one or more cores receive 5 GiB of storage for the root file system /. Jobs requesting a fraction of a core receive the same fraction of 5 GiB of storage. If you need additional storage, you can explicitly request more storage using this method and the extra storage space will be mounted at /io. Batch automatically writes allResourceFile
to /io.The default storage size is 0 Gi. The minimum storage size is 0 Gi and the maximum storage size is 64 Ti. If storage is set to a value between 0 Gi and 10 Gi, the storage request is rounded up to 10 Gi. All values are rounded up to the nearest Gi.
-
timeout
(timeout)¶ Set the maximum amount of time this job can run for in seconds.
Notes
Can only be used with the
backend.ServiceBackend
.Examples
>>> b = Batch(backend=backend.ServiceBackend('test')) >>> j = b.new_job() >>> (j.timeout(10) ... .command(f'echo "hello"'))
-