Python Version Compatibility Policy
Hail complies with NumPy’s compatibility policy on Python versions. In particular, Hail officially supports:
All minor versions of Python released 42 months prior to the project, and at minimum the two latest minor versions.
All minor versions of numpy released in the 24 months prior to the project, and at minimum the last three minor versions.
(#13681) Fix hailctl batch init and hailctl auth login for new users who have never set up a configuration before.
(#13643) Python jobs in Hail Batch that use the default image now support all supported python versions and include the hail python package.
(#13614) Fixed a bug that broke the LocalBackend when run inside a Jupyter notebook.
(#13200) hailtop.batch will now raise an error by default if a pipeline attempts to read or write files from or two cold storage buckets in GCP.
(#13565) Users can now use VEP images from the hailgenetics DockerHub in Hail Batch.
(#13007) Memory and storage request strings may now be optionally terminated with a B for bytes.
(#13051) Azure Blob Storage https URLs are now supported.
(#12731) Introduced hailtop.fs that makes public a filesystem module that works for local fs, gs, s3 and abs. This can be used by import hailtop.fs as hfs.
(#12917) ABS blob URIs in the form of https://<ACCOUNT_NAME>.blob.core.windows.net/<CONTAINER_NAME>/<PATH> are now supported when running in Azure. The hail-az scheme for referencing ABS blobs is now deprecated and will be removed in a future release.
(#12780) PythonJobs now handle arguments with resources nested inside dicts and lists.
(#12900) Reading data from public blobs is now supported in Azure.
(#12780) The LocalBackend now supports always_run jobs. The LocalBackend will no longer immediately error when a job fails, rather now aligns with the ServiceBackend in running all jobs whose parents have succeeded.
(#12845) The LocalBackend now sets the working directory for dockerized jobs to the root directory instead of the temp directory. This behavior now matches ServiceBackend jobs.
(#12530) Added the ability to update an existing batch with additional jobs by calling
Batch.run()more than once. The method
Batch.from_batch_id()can be used to construct a
Batchfrom a previously submitted batch.
Added a new method Job.regions() as well as a configurable parameter to the ServiceBackend to specify which cloud regions a job can run in. The default value is a job can run in any available region.
Support passing an authorization token to the
The bucket parameter in the
ServiceBackendhas been deprecated. Use remote_tmpdir instead.
Fixed a bug introduced in 0.2.74 where large commands were not interpolated correctly
Made resource files be represented as an explicit path in the command rather than using environment variables
Backend.closeto be idempotent
BatchPoolExecutorto always cancel all batches on errors
Large job commands are now written to GCS to avoid Linux argument length and number limitations.
Made failed Python Jobs have non-zero exit codes.
Added the ability to set values for
PythonJobfaster when using the
Added the option to specify either remote_tmpdir or bucket when using the
Fixed copying a directory from GCS when using the
Fixed writing files to GCS when the bucket name starts with a “g” or an “s”
Fixed the error “Argument list too long” when using the
Fixed an error where memory is set to None when using the
Removed the need for the
Batch()unless you are creating a PythonJob
Set the default for
Job.memoryto be ‘standard’
Added the cancel_after_n_failures option to
Fixed executing a job with
Job.memoryset to ‘lowmem’, ‘standard’, and ‘highmem’ when using the
Fixed executing a
PythonJobwhen using the
Job.memoryinputs lowmem, standard, and highmem corresponding to ~1Gi/core, ~4Gi/core, and ~7Gi/core respectively.
Job.storageis now interpreted as the desired extra storage mounted at /io in addition to the default root filesystem / when using the ServiceBackend. The root filesystem is allocated 5Gi for all jobs except 1.25Gi for 0.25 core jobs and 2.5Gi for 0.5 core jobs.
Changed how we bill for storage when using the ServiceBackend by decoupling storage requests from CPU and memory requests.
Added new worker types when using the ServiceBackend and automatically select the cheapest worker type based on a job’s CPU and memory requests.
Added concatenate and plink_merge functions that use tree aggregation when merging.
BatchPoolExecutor now raises an informative error message for a variety of “system” errors, such as missing container images.
LocalBackend.run()succeeding when intermediate command fails
Attempts are now sorted by attempt time in the Batch Service UI.
Implement and document
requester_pays_projectas a new parameter on batches.
Add support for a user-specified, at-most-once HTTP POST callback when a Batch completes.
Fixed the documentation for job memory and storage requests to have default units in bytes.