PythonJob
- class hailtop.batch.job.PythonJob(batch, token, *, name=None, attributes=None)
Bases:
Job
Object representing a single Python job to execute.
Examples
Create a new Python job that multiplies two numbers and then adds 5 to the result:
# Create a batch object with a default Python image b = Batch(default_python_image='hailgenetics/python-dill:3.9-slim') def multiply(x, y): return x * y def add(x, y): return x + y j = b.new_python_job() result = j.call(multiply, 2, 3) result = j.call(add, result, 5) # Write out the str representation of result to a file b.write_output(result.as_str(), 'hello.txt') b.run()
Notes
This class should never be created directly by the user. Use
Batch.new_python_job()
instead.Methods
Execute a Python function.
Set the job's docker image.
- call(unapplied, *args, **kwargs)
Execute a Python function.
Examples
import json def add(x, y): return x + y def multiply(x, y): return x * y def format_as_csv(x, y, add_result, mult_result): return f'{x},{y},{add_result},{mult_result}' def csv_to_json(path): data = [] with open(path) as f: for line in f: line = line.rstrip() fields = line.split(',') d = {'x': int(fields[0]), 'y': int(fields[1]), 'add': int(fields[2]), 'mult': int(fields[3])} data.append(d) return json.dumps(data) # Get all the multiplication and addition table results b = Batch(name='add-mult-table') formatted_results = [] for x in range(3): for y in range(3): j = b.new_python_job(name=f'{x}-{y}') add_result = j.call(add, x, y) mult_result = j.call(multiply, x, y) result = j.call(format_as_csv, x, y, add_result, mult_result) formatted_results.append(result.as_str()) cat_j = b.new_bash_job(name='concatenate') cat_j.command(f'cat {" ".join(formatted_results)} > {cat_j.output}') csv_to_json_j = b.new_python_job(name='csv-to-json') json_output = csv_to_json_j.call(csv_to_json, cat_j.output) b.write_output(j.as_str(), '/output/add_mult_table.json') b.run()
Notes
Unlike the
BashJob
, aPythonJob
returns a newPythonResult
for every invocation ofPythonJob.call()
. APythonResult
can be used as an argument in subsequent invocations ofPythonJob.call()
, as an argument in downstream python jobs, or as inputs to other bash jobs. Likewise,InputResourceFile
,JobResourceFile
, andResourceGroup
can be passed toPythonJob.call()
. Batch automatically detects dependencies between jobs including between python jobs and bash jobs.When a
ResourceFile
is passed as an argument, it is passed to the function as a string to the local file path. When aResourceGroup
is passed as an argument, it is passed to the function as a dict where the keys are the resource identifiers in the originalResourceGroup
and the values are the local file paths.Like
JobResourceFile
, allPythonResult
are stored as temporary files and must be written to a permanent location usingBatch.write_output()
if the output needs to be saved. A PythonResult is saved as a dill serialized object. However, you can use one of the methodsPythonResult.as_str()
,PythonResult.as_repr()
, orPythonResult.as_json()
to convert a PythonResult to a JobResourceFile with the desired output.Warning
You must have any non-builtin packages that are used by unapplied installed in your image. You can use
docker.build_python_image()
to build a Python image with additional Python packages installed that is compatible with Python jobs.Here are some tips to make sure your function can be used with Batch:
Only reference top-level modules in your functions: like numpy or pandas.
If you get a serialization error, try moving your imports into your function.
Instead of serializing a complex class, determine what information is essential and only serialize that, perhaps as a dict or array.
- Parameters:
unapplied (
Callable
) – A reference to a Python function to execute.args (
Union
[PythonResult
,ResourceFile
,ResourceGroup
,List
[Union
[PythonResult
,ResourceFile
,ResourceGroup
,List
[UnpreparedArg],Tuple
[UnpreparedArg,...
],Dict
[str
, UnpreparedArg],Any
]],Tuple
[Union
[PythonResult
,ResourceFile
,ResourceGroup
,List
[UnpreparedArg],Tuple
[UnpreparedArg,...
],Dict
[str
, UnpreparedArg],Any
],...
],Dict
[str
,Union
[PythonResult
,ResourceFile
,ResourceGroup
,List
[UnpreparedArg],Tuple
[UnpreparedArg,...
],Dict
[str
, UnpreparedArg],Any
]],Any
]) – Positional arguments to the Python function. Must be either a builtin Python object, aResource
, or a Dill serializable object.kwargs (
Union
[PythonResult
,ResourceFile
,ResourceGroup
,List
[Union
[PythonResult
,ResourceFile
,ResourceGroup
,List
[UnpreparedArg],Tuple
[UnpreparedArg,...
],Dict
[str
, UnpreparedArg],Any
]],Tuple
[Union
[PythonResult
,ResourceFile
,ResourceGroup
,List
[UnpreparedArg],Tuple
[UnpreparedArg,...
],Dict
[str
, UnpreparedArg],Any
],...
],Dict
[str
,Union
[PythonResult
,ResourceFile
,ResourceGroup
,List
[UnpreparedArg],Tuple
[UnpreparedArg,...
],Dict
[str
, UnpreparedArg],Any
]],Any
]) – Key-word arguments to the Python function. Must be either a builtin Python object, aResource
, or a Dill serializable object.
- Return type:
- Returns:
- image(image)
Set the job’s docker image.
Notes
image must already exist and have the same version of Python as what is being used on the computer submitting the Batch. It also must have the dill Python package installed. You can use the function
docker.build_python_image()
to build a new image containing dill and additional Python packages.Examples
Set the job’s docker image to hailgenetics/python-dill:3.9-slim:
>>> b = Batch() >>> j = b.new_python_job() >>> (j.image('hailgenetics/python-dill:3.9-slim') ... .call(print, 'hello')) >>> b.run()