class hailtop.batch.job.PythonJob(batch, token, *, name=None, attributes=None)

Bases: Job

Object representing a single Python job to execute.


Create a new Python job that multiplies two numbers and then adds 5 to the result:

# Create a batch object with a default Python image

b = Batch(default_python_image='hailgenetics/python-dill:3.8-slim')

def multiply(x, y):
    return x * y

def add(x, y):
    return x + y

j = b.new_python_job()
result =, 2, 3)
result =, result, 5)

# Write out the str representation of result to a file

b.write_output(result.as_str(), 'hello.txt')


This class should never be created directly by the user. Use Batch.new_python_job() instead.



Execute a Python function.


Set the job's docker image.

call(unapplied, *args, **kwargs)

Execute a Python function.


import json

def add(x, y):
    return x + y

def multiply(x, y):
    return x * y

def format_as_csv(x, y, add_result, mult_result):
    return f'{x},{y},{add_result},{mult_result}'

def csv_to_json(path):
    data = []
    with open(path) as f:
        for line in f:
            line = line.rstrip()
            fields = line.split(',')
            d = {'x': int(fields[0]),
                 'y': int(fields[1]),
                 'add': int(fields[2]),
                 'mult': int(fields[3])}
    return json.dumps(data)

# Get all the multiplication and addition table results

b = Batch(name='add-mult-table')

formatted_results = []

for x in range(3):
    for y in range(3):
        j = b.new_python_job(name=f'{x}-{y}')
        add_result =, x, y)
        mult_result =, x, y)
        result =, x, y, add_result, mult_result)

cat_j = b.new_bash_job(name='concatenate')
cat_j.command(f'cat {" ".join(formatted_results)} > {cat_j.output}')

csv_to_json_j = b.new_python_job(name='csv-to-json')
json_output =, cat_j.output)

b.write_output(j.as_str(), '/output/add_mult_table.json')


Unlike the BashJob, a PythonJob returns a new PythonResult for every invocation of A PythonResult can be used as an argument in subsequent invocations of, as an argument in downstream python jobs, or as inputs to other bash jobs. Likewise, InputResourceFile, JobResourceFile, and ResourceGroup can be passed to Batch automatically detects dependencies between jobs including between python jobs and bash jobs.

When a ResourceFile is passed as an argument, it is passed to the function as a string to the local file path. When a ResourceGroup is passed as an argument, it is passed to the function as a dict where the keys are the resource identifiers in the original ResourceGroup and the values are the local file paths.

Like JobResourceFile, all PythonResult are stored as temporary files and must be written to a permanent location using Batch.write_output() if the output needs to be saved. A PythonResult is saved as a dill serialized object. However, you can use one of the methods PythonResult.as_str(), PythonResult.as_repr(), or PythonResult.as_json() to convert a PythonResult to a JobResourceFile with the desired output.


You must have any non-builtin packages that are used by unapplied installed in your image. You can use docker.build_python_image() to build a Python image with additional Python packages installed that is compatible with Python jobs.

Here are some tips to make sure your function can be used with Batch:

  • Only reference top-level modules in your functions: like numpy or pandas.

  • If you get a serialization error, try moving your imports into your function.

  • Instead of serializing a complex class, determine what information is essential and only serialize that, perhaps as a dict or array.

  • unapplied (Callable) – A reference to a Python function to execute.

  • args – Positional arguments to the Python function. Must be either a builtin Python object, a Resource, or a Dill serializable object.

  • kwargs – Key-word arguments to the Python function. Must be either a builtin Python object, a Resource, or a Dill serializable object.

Return type:





Set the job’s docker image.


image must already exist and have the same version of Python as what is being used on the computer submitting the Batch. It also must have the dill Python package installed. You can use the function docker.build_python_image() to build a new image containing dill and additional Python packages.


Set the job’s docker image to hailgenetics/python-dill:3.8-slim:

>>> b = Batch()
>>> j = b.new_python_job()
>>> (j.image('hailgenetics/python-dill:3.8-slim')
...   .call(print, 'hello'))

image (str) – Docker image to use.

Return type:



Same job object with docker image set.