BashJob
- class hailtop.batch.job.BashJob(batch, token, *, name=None, attributes=None, shell=None)
Bases:
Job
Object representing a single bash job to execute.
Examples
Create a batch object:
>>> b = Batch()
Create a new bash job that prints hello to a temporary file t.ofile:
>>> j = b.new_job() >>> j.command(f'echo "hello" > {j.ofile}')
Write the temporary file t.ofile to a permanent location
>>> b.write_output(j.ofile, 'hello.txt')
Execute the DAG:
>>> b.run()
Notes
This class should never be created directly by the user. Use
Batch.new_job()
orBatch.new_bash_job()
instead.Methods
Set the job's command to execute.
Declare a resource group for a job.
Set the job's docker image.
- command(command)
Set the job’s command to execute.
Examples
Simple job with no output files:
>>> b = Batch() >>> j = b.new_job() >>> j.command(f'echo "hello"') >>> b.run()
Simple job with one temporary file j.ofile that is written to a permanent location:
>>> b = Batch() >>> j = b.new_job() >>> j.command(f'echo "hello world" > {j.ofile}') >>> b.write_output(j.ofile, 'output/hello.txt') >>> b.run()
Two jobs with a file interdependency:
>>> b = Batch() >>> j1 = b.new_job() >>> j1.command(f'echo "hello" > {j1.ofile}') >>> j2 = b.new_bash_job() >>> j2.command(f'cat {j1.ofile} > {j2.ofile}') >>> b.write_output(j2.ofile, 'output/cat_output.txt') >>> b.run()
Specify multiple commands in the same job:
>>> b = Batch() >>> t = b.new_job() >>> j.command(f'echo "hello" > {j.tmp1}') >>> j.command(f'echo "world" > {j.tmp2}') >>> j.command(f'echo "!" > {j.tmp3}') >>> j.command(f'cat {j.tmp1} {j.tmp2} {j.tmp3} > {j.ofile}') >>> b.write_output(j.ofile, 'output/concatenated.txt') >>> b.run()
Notes
This method can be called more than once. It’s behavior is to append commands to run to the set of previously defined commands rather than overriding an existing command.
To declare a resource file of type
JobResourceFile
, use either the get attribute syntax of job.{identifier} or the get item syntax of job[‘identifier’]. If an object for that identifier doesn’t exist, then one will be created automatically (only allowed in thecommand()
method). The identifier name can be any valid Python identifier such as ofile5000.All
JobResourceFile
are temporary files and must be written to a permanent location usingBatch.write_output()
if the output needs to be saved.Only resources can be referred to in commands. Referencing a
batch.Batch
orJob
will result in an error.
- declare_resource_group(**mappings)
Declare a resource group for a job.
Examples
Declare a resource group:
>>> b = Batch() >>> input = b.read_input_group(bed='data/example.bed', ... bim='data/example.bim', ... fam='data/example.fam') >>> j = b.new_job() >>> j.declare_resource_group(tmp1={'bed': '{root}.bed', ... 'bim': '{root}.bim', ... 'fam': '{root}.fam', ... 'log': '{root}.log'}) >>> j.command(f'plink --bfile {input} --make-bed --out {j.tmp1}') >>> b.run()
Warning
Be careful when specifying the expressions for each file as this is Python code that is executed with eval!
- Parameters:
mappings (
Dict
[str
,Any
]) – Keywords (in the above example tmp1) are the name(s) of the resource group(s). File names may contain arbitrary Python expressions, which will be evaluated by Python eval. To use the keyword as the file name, use {root} (in the above example {root} will be replaced with tmp1).- Return type:
- Returns:
Same job object with resource groups set.
- image(image)
Set the job’s docker image.
Examples
Set the job’s docker image to ubuntu:20.04:
>>> b = Batch() >>> j = b.new_job() >>> (j.image('ubuntu:20.04') ... .command(f'echo "hello"')) >>> b.run()