Python API

This is the API documentation for Batch, and provides detailed information on the Python programming interface.

Use import hailtop.batch to access this functionality.

Batches

A Batch is an object that represents the set of jobs to run and the order or dependencies between the jobs. Each Job has an image in which to execute commands and settings for storage, memory, and CPU. A BashJob is a subclass of Job that runs bash commands while a PythonJob executes Python functions.

batch.Batch

Object representing the distributed acyclic graph (DAG) of jobs to run.

job.Job

Object representing a single job to execute.

job.BashJob

Object representing a single bash job to execute.

job.PythonJob

Object representing a single Python job to execute.

Resources

A Resource is an abstract class that represents files in a Batch and has two subtypes: ResourceFile and ResourceGroup.

A single file is represented by a ResourceFile which has two subtypes: InputResourceFile and JobResourceFile. An InputResourceFile is used to specify files that are inputs to a Batch. These files are not generated as outputs from a Job. Likewise, a JobResourceFile is a file that is produced by a job. JobResourceFiles generated by one job can be used in subsequent job, creating a dependency between the jobs.

A ResourceGroup represents a collection of files that should be treated as one unit. All files share a common root, but each file has its own extension.

A PythonResult stores the output from running a PythonJob.

resource.Resource

Abstract class for resources.

resource.ResourceFile

Class representing a single file resource.

resource.InputResourceFile

Class representing a resource from an input file.

resource.JobResourceFile

Class representing an intermediate file from a job.

resource.ResourceGroup

Class representing a mapping of identifiers to a resource file.

resource.PythonResult

Class representing a result from a Python job.

Batch Pool Executor

A BatchPoolExecutor provides roughly the same interface as the Python standard library’s concurrent.futures.Executor. It facilitates executing arbitrary Python functions in the cloud.

batch_pool_executor.BatchPoolExecutor

An executor which executes Python functions in the cloud.

batch_pool_executor.BatchPoolFuture

Backends

A Backend is an abstract class that can execute a Batch. Currently, there are two types of backends: LocalBackend and ServiceBackend. The local backend executes a batch on your local computer by running a shell script. The service backend executes a batch on Google Compute Engine VMs operated by the Hail team (Batch Service). You can access the UI for the Batch Service at https://batch.hail.is.

backend.RunningBatchType

The type of value returned by Backend._run().

backend.Backend

Abstract class for backends.

backend.LocalBackend

Backend that executes batches on a local computer.

backend.ServiceBackend

Backend that executes batches on Hail's Batch Service on Google Cloud.

Utilities

docker.build_python_image

Build a new Python image with dill and the specified pip packages installed.

utils.concatenate

Concatenate files using tree aggregation.

utils.plink_merge

Merge binary PLINK files using tree aggregation.