Change Log And Version Policy ============================= Python Version Compatibility Policy ----------------------------------- Hail complies with `NumPy’s compatibility policy `__ on Python versions. In particular, Hail officially supports: - All minor versions of Python released 42 months prior to the project, and at minimum the two latest minor versions. - All minor versions of numpy released in the 24 months prior to the project, and at minimum the last three minor versions. Frequently Asked Questions -------------------------- With a version like 0.x, is Hail ready for use in publications? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Yes. The `semantic versioning standard `__ uses 0.x (development) versions to refer to software that is either “buggy” or “partial”. While we don’t view Hail as particularly buggy (especially compared to one-off untested scripts pervasive in bioinformatics!), Hail 0.2 is a partial realization of a larger vision. What is the difference between the Hail Python library version and the native file format version? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The Hail Python library version, the version you see on `PyPI `__, in ``pip``, or in ``hl.version()`` changes every time we release the Python library. The Hail native file format version only changes when we change the format of Hail Table and MatrixTable files. If a version of the Python library introduces a new native file format version, we note that in the change log. All subsequent versions of the Python library can read the new file format version. The native file format changes much slower than the Python library version. It is not currently possible to view the file format version of a Hail Table or MatrixTable. What stability is guaranteed? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The Hail file formats and Python API are backwards compatible. This means that a script developed to run on Hail 0.2.5 should continue to work in every subsequent release within the 0.2 major version. This also means any file written by python library versions 0.2.1 through 0.2.5 can be read by 0.2.5. Forward compatibility of file formats and the Python API is not guaranteed. In particular, a new file format version is only readable by library versions released after the file format. For example, Python library version 0.2.119 introduces a new file format version: 1.7.0. All library versions before 0.2.119, for example 0.2.118, *cannot* read file format version 1.7.0. All library versions after and including 0.2.119 *can* read file format version 1.7.0. Each version of the Hail Python library can only write files using the latest file format version it supports. **The hl.experimental package and other methods marked experimental in the docs are exempt from this policy. Their functionality or even existence may change without notice. Please contact us if you critically depend on experimental functionality.** Version 0.2.130 --------------- Released 2024-10-02 0.2.129 contained test configuration artifacts that prevented users from starting dataproc clusters with ``hailctl``. Please upgrade to 0.2.130 if you use dataproc. New Features ~~~~~~~~~~~~ - (hail##14447) Added ``copy_spark_log_on_error`` initialization flag that when set, copies the hail driver log to the remote ``tmpdir`` if query execution raises an exception. Bug Fixes ~~~~~~~~~ - (`#14452 `__) Fixes a bug that prevents users from starting dataproc clusters with hailctl Version 0.2.129 --------------- Released 2024-04-02 Documentation ~~~~~~~~~~~~~ - (`#14321 `__) Removed ``GOOGLE_APPLICATION_CREDENTIALS`` from batch docs. Metadata server introduction means users no longer need to explicitly activate service accounts with the ``gcloud`` command line tool. - (`#14339 `__) Added citations since 2021 .. _new-features-1: New Features ~~~~~~~~~~~~ - (`#14406 `__) Performance improvements for reading structured data from (Matrix)Tables - (`#14255 `__) Added Cochran-Hantel-Haenszel test for association (``cochran_mantel_haenszel_test``). Our thanks to @Will-Tyler for generously contributing this feature. - (`#14393 `__) ``hail`` depends on ``protobuf`` no longer; users may choose their own version of ``protobuf``. - (`#14360 `__) Exposed previously internal ``_num_allele_type`` as ``numeric_allele_type`` and deprecated it. Add new ``AlleleType`` enumeration for users to be able to easily use the values returned by ``numeric_allele_type``. - (`#14297 `__) ``vds.sample_gc`` now uses independent aggregators. Users may now import these functions and use them directly. - (`#14405 `__) ``VariantDataset.validate`` now checks that all ref blocks are no longer than the ref_block_max_length field, if it exists. .. _bug-fixes-1: Bug Fixes ~~~~~~~~~ - (`#14420 `__) Fixes a serious, but likely rare, bug in the Table/MatrixTable reader, which has been present since Sep 2020. It manifests as many (around half or more) of the rows being dropped. This could only happen when 1) reading a (matrix)table whose partitioning metadata allows rows with the same key to be split across neighboring partitions, and 2) reading it with a different partitioning than it was written. 1) would likely only happen by reading data keyed by locus and alleles, and rekeying it to only locus before writing. 2) would likely only happen by using the ``_intervals`` or ``_n_partitions`` arguments to ``read_(matrix)_table``, or possibly ``repartition``. Please reach out to us if you’re concerned you may have been affected by this. - (`#14330 `__) Fixes erroneous error in ``export_vcf`` with unphased haploid Calls. - (`#14303 `__) Fix missingness error when sampling entries from a MatrixTable. - (`#14288 `__) Contigs may now be compared for inquality while filtering rows. Deprecations ~~~~~~~~~~~~ - (`#14386 `__) ``MatrixTable.make_table`` is deprecated. Use ``.localize_entries`` instead. Version 0.2.128 --------------- Released 2024-02-16 In GCP, the Hail Annotation DB and Datasets API have moved from multi-regional US and EU buckets to regional US-CENTRAL1 and EUROPE-WEST1 buckets. These buckets are requester pays which means unless your cluster is in the US-CENTRAL1 or EUROPE-WEST1 region, you will pay a per-gigabyte rate to read from the Annotation DB or Datasets API. We must make this change because `reading from a multi-regional bucket into a regional VM is no longer free `__. Unfortunately, cost constraints require us to choose only one region per continent and we have chosen US-CENTRAL1 and EUROPE-WEST1. .. _documentation-1: Documentation ~~~~~~~~~~~~~ - (`#14113 `__) Add examples to ``Table.parallelize``, ``Table.key_by``, ``Table.annotate_globals``, ``Table.select_globals``, ``Table.transmute_globals``, ``Table.transmute``, ``Table.annotate``, and ``Table.filter``. - (`#14242 `__) Add examples to ``Table.sample``, ``Table.head``, and ``Table.semi``\ \_join. .. _new-features-2: New Features ~~~~~~~~~~~~ - (`#14206 `__) Introduce ``hailctl config set http/timeout_in_seconds`` which Batch and QoB users can use to increase the timeout on their laptops. Laptops tend to have flaky internet connections and a timeout of 300 seconds produces a more robust experience. - (`#14178 `__) Reduce VDS Combiner runtime slightly by computing the maximum ref block length without executing the combination pipeline twice. - (`#14207 `__) VDS Combiner now verifies that every GVCF path and sample name is unique. .. _bug-fixes-2: Bug Fixes ~~~~~~~~~ - (`#14300 `__) Require orjson<3.9.12 to avoid a segfault introduced in orjson 3.9.12 - (`#14071 `__) Use indexed VEP cache files for GRCh38 on both dataproc and QoB. - (`#14232 `__) Allow use of large numbers of fields on a table without triggering ``ClassTooLargeException: Class too large:``. - (`#14246 `__)(`#14245 `__) Fix a bug, introduced in 0.2.114, in which ``Table.multi_way_zip_join`` and ``Table.aggregate_by_key`` could throw “NoSuchElementException: Ref with name ``__iruid_...``” when one or more of the tables had a number of partitions substantially different from the desired number of output partitions. - (`#14202 `__) Support coercing ``{}`` (the empty dictionary) into any Struct type (with all missing fields). - (`#14239 `__) Remove an erroneous statement from the MatrixTable tutorial. - (`#14176 `__) ``hailtop.fs.ls`` can now list a bucket, e.g. ``hailtop.fs.ls("gs://my-bucket")``. - (`#14258 `__) Fix ``import_avro`` to not raise ``NullPointerException`` in certain rare cases (e.g. when using ``_key_by_assert_sorted``). - (`#14285 `__) Fix a broken link in the MatrixTable tutorial. .. _deprecations-1: Deprecations ~~~~~~~~~~~~ - (`#14293 `__) Support for the ``hail-az://`` scheme, deprecated in 0.2.116, is now gone. Please use the standard ``https://ACCOUNT.blob.core.windows.net/CONTAINER/PATH``. Version 0.2.127 --------------- Released 2024-01-12 If you have an Apple M1 laptop, verify that :: file $JAVA_HOME/bin/java returns a message including the phrase “arm64”. If it instead includes the phrase “x86_64” then you must upgrade to a new version of Java. You may find such a version of Java `here `__. .. _new-features-3: New Features ~~~~~~~~~~~~ - (`#14093 `__) ``hailctl dataproc`` now creates clusters using Dataproc version 2.1.33. It previously used version 2.1.2. - (`#13617 `__) Query-on-Batch now supports joining two tables keyed by intervals. - (`#13795 `__)(`#13567 `__) Enable passing a requester pays configuration to ``hailtop.fs.open``. .. _bug-fixes-3: Bug Fixes ~~~~~~~~~ - (`#14110 `__) Fix ``hailctl hdinsight start``, which has been broken since 0.2.118. - (`#14098 `__)(`#14090 `__)(`#14118 `__) Fix (`#14089 `__), which makes ``hailctl dataproc connect`` work in Windows Subsystem for Linux. - (`#14048 `__) Fix (`#13979 `__), affecting Query-on-Batch and manifesting most frequently as “com.github.luben.zstd.ZstdException: Corrupted block detected”. - (`#14066 `__) Since 0.2.110, ``hailctl dataproc`` set the heap size of the driver JVM dangerously high. It is now set to an appropriate level. This issue manifests in a variety of inscrutable ways including RemoteDisconnectedError and socket closed. See issue (`#13960 `__) for details. - (`#14057 `__) Fix (`#13998 `__) which appeared in 0.2.58 and prevented reading from a networked filesystem mounted within the filesystem of the worker node for certain pipelines (those that did not trigger “lowering”). - (`#14006 `__) Fix (`#14000 `__). Hail now supports identity_by_descent on Apple M1 and M2 chips; however, your Java installation must be an arm64 installation. Using x86_64 Java with Hail on Apple M1 or M2 will cause SIGILL errors. If you have an Apple M1 or Apple M2 and ``/usr/libexec/java_home -V`` does not include ``(arm64)``, you must switch to an arm64 version of the JVM. - (`#14022 `__) Fix (`#13937 `__) caused by faulty library code in the Google Cloud Storage API Java client library. - (`#13812 `__) Permit ``hailctl batch submit`` to accept relative paths. Fix (`#13785 `__). - (`#13885 `__) Hail Query-on-Batch previously used Class A Operations for all interaction with blobs. This change ensures that QoB only uses Class A Operations when necessary. - (`#14127 `__) ``hailctl dataproc start ... --dry-run`` now uses shell escapes such that, after copied and pasted into a shell, the ``gcloud`` command works as expected. - (`#14062 `__) Fix (`#14052 `__) which caused incorrect results for identity by descent in Query-on-Batch. - (`#14122 `__) Ensure that stack traces are transmitted from workers to the driver to the client. - (`#14105 `__) When a VCF contains missing values in array fields, Hail now suggests using ``array_elements_required=False``. .. _deprecations-2: Deprecations ~~~~~~~~~~~~ - (`#13987 `__) Deprecate ``default_reference`` parameter to ``hl.init``, users should use ``hl.default_reference`` with an argument to set new default references usually shortly after ``hl.init``. Version 0.2.126 --------------- Released 2023-10-30 .. _bug-fixes-4: Bug Fixes ~~~~~~~~~ - (`#13939 `__) Fix a bug introduced in 0.2.125 which could cause dict literals created in python to be decoded incorrectly, causing runtime errors or, potentially, incorrect results. - (`#13751 `__) Correct the broadcasting of ndarrays containing at least one dimension of length zero. This previously produced incorrect results. Version 0.2.125 --------------- Released 2023-10-26 .. _new-features-4: New Features ~~~~~~~~~~~~ - (`#13682 `__) ``hl.export_vcf`` now clearly reports all Table or Matrix Table fields which cannot be represented in a VCF. - (`#13355 `__) Improve the Hail compiler to more reliably rewrite ``Table.filter`` and ``MatrixTable.filter_rows`` to use ``hl.filter_intervals``. Before this change some queries required reading all partitions even though only a small number of partitions match the filter. - (`#13787 `__) Improve speed of reading hail format datasets from disk. Simple pipelines may see as much as a halving in latency. - (`#13849 `__) Fix (`#13788 `__), improving the error message when ``hl.logistic_regression_rows`` is provided row or entry annotations for the dependent variable. - (`#13888 `__) ``hl.default_reference`` can now be passed an argument to change the default reference genome. .. _bug-fixes-5: Bug Fixes ~~~~~~~~~ - (`#13702 `__) Fix (`#13699 `__) and (`#13693 `__). Since 0.2.96, pipelines that combined random functions (e.g. ``hl.rand_unif``) with ``index(..., all_matches=True)`` could fail with a ``ClassCastException``. - (`#13707 `__) Fix (`#13633 `__). ``hl.maximum_independent_set`` now accepts strings as the names of individuals. It has always accepted structures containing a single string field. - (`#13713 `__) Fix (`#13704 `__), in which Hail could encounter an IllegalArgumentException if there are too many transient errors. - (`#13730 `__) Fix (`#13356 `__) and (`#13409 `__). In QoB pipelines with 10K or more partitions, transient “Corrupted block detected” errors were common. This was caused by incorrect retry logic. That logic has been fixed. - (`#13732 `__) Fix (`#13721 `__) which manifested with the message “Missing Range header in response”. The root cause was a bug in the Google Cloud Storage SDK on which we rely. The fix is to update to a version without this bug. The buggy version of GCS SDK was introduced in 0.2.123. - (`#13759 `__) Since Hail 0.2.123, Hail would hang in Dataproc Notebooks due to (`#13690 `__). - (`#13755 `__) Ndarray concatenation now works with arrays with size zero dimensions. - (`#13817 `__) Mitigate new transient error from Google Cloud Storage which manifests as ``aiohttp.client_exceptions.ClientOSError: [Errno 1] [SSL: SSLV3_ALERT_BAD_RECORD_MAC] sslv3 alert bad record mac (_ssl.c:2548)``. - (`#13715 `__) Fix (`#13697 `__), a long standing issue with QoB. When a QoB driver or worker fails, the corresponding Batch Job will also appear as failed. - (`#13829 `__) Fix (`#13828 `__). The Hail combiner now properly imports ``PGT`` fields from GVCFs. - (`#13805 `__) Fix (`#13767 `__). ``hailctl dataproc submit`` now expands ``~`` in the ``--files`` and ``--pyfiles`` arguments. - (`#13797 `__) Fix (`#13756 `__). Operations that collect large results such as ``to_pandas`` may require up to 3x less memory. - (`#13826 `__) Fix (`#13793 `__). Ensure ``hailctl describe -u`` overrides the ``gcs_requester_pays/project`` config variable. - (`#13814 `__) Fix (`#13757 `__). Pipelines that are memory-bound by copious use of ``hl.literal``, such as ``hl.vds.filter_intervals``, require substantially less memory. - (`#13894 `__) Fix (`#13837 `__) in which Hail could break a Spark installation if the Hail JAR appears on the classpath before the Scala JARs. - (`#13919 `__) Fix (`#13915 `__) which prevented using a glob pattern in ``hl.import_vcf``. Version 0.2.124 --------------- Released 2023-09-21 .. _new-features-5: New Features ~~~~~~~~~~~~ - (`#13608 `__) Change default behavior of hl.ggplot.geom_density to use a new method. The old method is still available using the flag smoothed=True. The new method is typically a much more accurate representation, and works well for any distribution, not just smooth ones. Version 0.2.123 --------------- Released 2023-09-19 .. _new-features-6: New Features ~~~~~~~~~~~~ - (`#13610 `__) Additional setup is no longer required when using hail.plot or hail.ggplot in a Jupyter notebook (calling bokeh.io.output_notebook or hail.plot.output_notebook and/or setting plotly.io.renderers.default = ‘iframe’ is no longer necessary). .. _bug-fixes-6: Bug Fixes ~~~~~~~~~ - (`#13634 `__) Fix a bug which caused Query-on-Batch pipelines with a large number of partitions (close to 100k) to run out of memory on the driver after all partitions finish. - (`#13619 `__) Fix an optimization bug that, on some pipelines, since at least 0.2.58 (commit 23813af), resulted in Hail using essentially unbounded amounts of memory. - (`#13609 `__) Fix a bug in hail.ggplot.scale_color_continuous that sometimes caused errors by generating invalid colors. Version 0.2.122 --------------- Released 2023-09-07 .. _new-features-7: New Features ~~~~~~~~~~~~ - (`#13508 `__) The n parameter of MatrixTable.tail is deprecated in favor of a new n_rows parameter. .. _bug-fixes-7: Bug Fixes ~~~~~~~~~ - (`#13498 `__) Fix a bug where field names can shadow methods on the StructExpression class, e.g. “items”, “keys”, “values”. Now the only way to access such fields is through the getitem syntax, e.g. “some_struct[‘items’]”. It’s possible this could break existing code that uses such field names. - (`#13585 `__) Fix bug introduced in 0.2.121 where Query-on-Batch users could not make requests to ``batch.hail.is`` without a domain configuration set. Version 0.2.121 --------------- Released 2023-09-06 .. _new-features-8: New Features ~~~~~~~~~~~~ - (`#13385 `__) The VDS combiner now supports arbitrary custom call fields via the ``call_fields`` parameter. - (`#13224 `__) ``hailctl config get``, ``set``, and ``unset`` now support shell auto-completion. Run ``hailctl --install-completion zsh`` to install the auto-completion for ``zsh``. You must already have completion enabled for ``zsh``. - (`#13279 `__) Add ``hailctl batch init`` which helps new users interactively set up ``hailctl`` for Query-on-Batch and Batch use. .. _bug-fixes-8: Bug Fixes ~~~~~~~~~ - (`#13573 `__) Fix (`#12936 `__) in which VEP frequently failed (due to Docker not starting up) on clusters with a non-trivial number of workers. - (`#13485 `__) Fix (`#13479 `__) in which ``hl.vds.local_to_global`` could produce invalid values when the LA field is too short. There were and are no issues when the LA field has the correct length. - (`#13340 `__) Fix ``copy_log`` to correctly copy relative file paths. - (`#13364 `__) ``hl.import_gvcf_interval`` now treats ``PGT`` as a call field. - (`#13333 `__) Fix interval filtering regression: ``filter_rows`` or ``filter`` mentioning the same field twice or using two fields incorrectly read the entire dataset. In 0.2.121, these filters will correctly read only the relevant subset of the data. - (`#13368 `__) In Azure, Hail now uses fewer “list blobs” operations. This should reduce cost on pipelines that import many files, export many of files, or use file glob expressions. - (`#13414 `__) Resolves (`#13407 `__) in which uses of ``union_rows`` could reduce parallelism to one partition resulting in severely degraded performance. - (`#13405 `__) ``MatrixTable.aggregate_cols`` no longer forces a distributed computation. This should be what you want in the majority of cases. In case you know the aggregation is very slow and should be parallelized, use ``mt.cols().aggregate`` instead. - (`#13460 `__) In Query-on-Spark, restore ``hl.read_table`` optimization that avoids reading unnecessary data in pipelines that do not reference row fields. - (`#13447 `__) Fix (`#13446 `__). In all three submit commands (``batch``, ``dataproc``, and ``hdinsight``), Hail now allows and encourages the use of – to separate arguments meant for the user script from those meant for hailctl. In hailctl batch submit, option-like arguments, for example “–foo”, are now supported before “–” if and only if they do not conflict with a hailctl option. - (`#13422 `__) ``hailtop.hail_frozenlist.frozenlist`` now has an eval-able ``repr``. - (`#13523 `__) ``hl.Struct`` is now pickle-able. - (`#13505 `__) Fix bug introduced in 0.2.117 by commit ``c9de81108`` which prevented the passing of keyword arguments to Python jobs. This manifested as “ValueError: too many values to unpack”. - (`#13536 `__) Fixed (`#13535 `__) which prevented the use of Python jobs when the client (e.g. your laptop) Python version is 3.11 or later. - (`#13434 `__) In QoB, Hail’s file systems now correctly list all files in a directory, not just the first 1000. This could manifest in an ``import_table`` or ``import_vcf`` which used a glob expression. In such a case, only the first 1000 files would have been included in the resulting Table or MatrixTable. - (`#13550 `__) ``hl.utils.range_table(n)`` now supports all valid 32-bit signed integer values of ``n``. - (`#13500 `__) In Query-on-Batch, the client-side Python code will not try to list every job when a QoB batch fails. This could take hours for long-running pipelines or pipelines with many partitions. .. _deprecations-3: Deprecations ~~~~~~~~~~~~ - (`#13275 `__) Hail no longer officially supports Python 3.8. - (`#13508 `__) The ``n`` parameter of ``MatrixTable.tail`` is deprecated in favor of a new ``n_rows`` parameter. Version 0.2.120 --------------- Released 2023-07-27 .. _new-features-9: New Features ~~~~~~~~~~~~ - (`#13206 `__) The VDS Combiner now works in Query-on-Batch. .. _bug-fixes-9: Bug Fixes ~~~~~~~~~ - (`#13313 `__) Fix bug introduced in 0.2.119 which causes a serialization error when using Query-on-Spark to read a VCF which is sorted by locus, with split multi-allelics, in which the records sharing a single locus do not appear in the dictionary ordering of their alternate alleles. - (`#13264 `__) Fix bug which ignored the ``partition_hint`` of a Table group-by-and-aggregate. - (`#13239 `__) Fix bug which ignored the ``HAIL_BATCH_REGIONS`` argument when determining in which regions to schedule jobs when using Query-on-Batch. - (`#13253 `__) Improve ``hadoop_ls`` and ``hfs.ls`` to quickly list globbed files in a directory. The speed improvement is proportional to the number of files in the directory. - (`#13226 `__) Fix the comparison of an ``hl.Struct`` to an ``hl.struct`` or field of type ``tstruct``. Resolves (`#13045 `__) and (Hail#13046). - (`#12995 `__) Fixed bug causing poor performance and memory leaks for ``MatrixTable.annotate_rows`` aggregations. Version 0.2.119 --------------- Released 2023-06-28 .. _new-features-10: New Features ~~~~~~~~~~~~ - (`#12081 `__) Hail now uses `Zstandard `__ as the default compression algorithm for table and matrix table storage. Reducing file size around 20% in most cases. - (`#12988 `__) Arbitrary aggregations can now be used on arrays via ``ArrayExpression.aggregate``. This method is useful for accessing functionality that exists in the aggregator library but not the basic expression library, for instance, ``call_stats``. - (`#13166 `__) Add an ``eigh`` ndarray method, for finding eigenvalues of symmetric matrices (“h” is for Hermitian, the complex analogue of symmetric). .. _bug-fixes-10: Bug Fixes ~~~~~~~~~ - (`#13184 `__) The ``vds.to_dense_mt`` no longer densifies past the end of contig boundaries. A logic bug in ``to_dense_mt`` could lead to reference data toward’s the end of one contig being applied to the following contig up until the first reference block of the contig. - (`#13173 `__) Fix globbing in scala blob storage filesystem implementations. File Format ~~~~~~~~~~~ - The native file format version is now 1.7.0. Older versions of Hail will not be able to read tables or matrix tables written by this version of Hail. Version 0.2.118 --------------- Released 2023-06-13 .. _new-features-11: New Features ~~~~~~~~~~~~ - (`#13140 `__) Enable ``hail-az`` and Azure Blob Storage ``https`` URLs to contain SAS tokens to enable bearer-auth style file access to Azure storage. - (`#13129 `__) Allow subnet to be passed through to gcloud in hailctl. .. _bug-fixes-11: Bug Fixes ~~~~~~~~~ - (`#13126 `__) Query-on-Batch pipelines with one partition are now retried when they encounter transient errors. - (`#13113 `__) ``hail.ggplot.geom_point`` now displays a legend group for a column even when it has only one value in it. - (`#13075 `__) (`#13074 `__) Add a new transient error plaguing pipelines in Query-on-Batch in Google: ``java.net.SocketTimeoutException: connect timed out``. - (`#12569 `__) The documentation for ``hail.ggplot.facets`` is now correctly included in the API reference. -------------- Version 0.2.117 --------------- Released 2023-05-22 .. _new-features-12: New Features ~~~~~~~~~~~~ - (`#12875 `__) Parallel export modes now write a manifest file. These manifest files are text files with one filename per line, containing name of each shard written successfully to the directory. These filenames are relative to the export directory. - (`#13007 `__) In Query-on-Batch and ``hailtop.batch``, memory and storage request strings may now be optionally terminated with a ``B`` for bytes. .. _bug-fixes-12: Bug Fixes ~~~~~~~~~ - (`#13065 `__) In Azure Query-on-Batch, fix a resource leak that prevented running pipelines with >500 partitions and created flakiness with >250 partitions. - (`#13067 `__) In Query-on-Batch, driver and worker logs no longer buffer so messages should arrive in the UI after a fixed delay rather than proportional to the frequency of log messages. - (`#13028 `__) Fix crash in ``hl.vds.filter_intervals`` when using a table to filter a VDS that stores the max ref block length. - (`#13060 `__) Prevent 500 Internal Server Error in Jupyter Notebooks of Dataproc clusters started by ``hailctl dataproc``. - (`#13051 `__) In Query-on-Batch and ``hailtop.batch``, Azure Blob Storage ``https`` URLs are now supported. - (`#13042 `__) In Query-on-Batch, ``naive_coalesce`` no longer performs a full write/read of the dataset. It now operates identically to the Query-on-Spark implementation. - (`#13031 `__) In ``hl.ld_prune``, an informative error message is raised when a dataset does not contain diploid calls instead of an assertion error. - (`#13032 `__) In Query-on-Batch, in Azure, Hail now users a newer version of the Azure blob storage libraries to reduce the frequency of “Stream is already closed” errors. - (`#13011 `__) In Query-on-Batch, the driver will use ~1/2 as much memory to read results as it did in 0.2.115. - (`#13013 `__) In Query-on-Batch, transient errors while streaming from Google Storage are now automatically retried. -------------- Version 0.2.116 --------------- Released 2023-05-08 .. _new-features-13: New Features ~~~~~~~~~~~~ - (`#12917 `__) ABS blob URIs in the format of ``https://.blob.core.windows.net//`` are now supported. - (`#12731 `__) Introduced ``hailtop.fs`` that makes public a filesystem module that works for local fs, gs, s3 and abs. This is now used as the ``Backend.fs`` for hail query but can be used standalone for Hail Batch users by ``import hailtop.fs as hfs``. .. _deprecations-4: Deprecations ~~~~~~~~~~~~ - (`#12929 `__) Hail no longer officially supports Python 3.7. - (`#12917 `__) The ``hail-az`` scheme for referencing blobs in ABS is now deprecated and will be removed in an upcoming release. .. _bug-fixes-13: Bug Fixes ~~~~~~~~~ - (`#12913 `__) Fixed bug in ``hail.ggplot`` where all legend entries would have the same text if one column had exactly one value for all rows and was mapped to either the ``shape`` or the ``color`` aesthetic for ``geom_point``. - (`#12901 `__) ``hl.Struct`` now has a correct and useful implementation of ``pprint``. -------------- Version 0.2.115 --------------- Released 2023-04-25 .. _new-features-14: New Features ~~~~~~~~~~~~ - (`#12731 `__) Introduced ``hailtop.fs`` that makes public a filesystem module that works for local fs, gs, s3 and abs. This can be used by ``import hailtop.fs as hfs`` but has also replaced the underlying implementation of the ``hl.hadoop_*`` methods. This means that the ``hl.hadoop_*`` methods now support these additional blob storage providers. - (`#12917 `__) ABS blob URIs in the form of ``https://.blob.core.windows.net//`` are now supported when running in Azure. .. _deprecations-5: Deprecations ~~~~~~~~~~~~ - (`#12917 `__) The ``hail-az`` scheme for referencing ABS blobs in Azure is deprecated in favor of the ``https`` scheme and will be removed in a future release. .. _bug-fixes-14: Bug Fixes ~~~~~~~~~ - (`#12919 `__) An interactive hail session is no longer unusable after hitting CTRL-C during a batch execution in Query-on-Batch - (`#12913 `__) Fixed bug in ``hail.ggplot`` where all legend entries would have the same text if one column had exactly one value for all rows and was mapped to either the ``shape`` or the ``color`` aesthetic for ``geom_point``. -------------- Version 0.2.114 --------------- Released 2023-04-19 .. _new-features-15: New Features ~~~~~~~~~~~~ - (`#12880 `__) Added ``hl.vds.store_ref_block_max_len`` to patch old VDSes to make interval filtering faster. .. _bug-fixes-15: Bug Fixes ~~~~~~~~~ - (`#12860 `__) Fixed memory leak in shuffles in Query-on-Batch. -------------- Version 0.2.113 --------------- Released 2023-04-07 .. _new-features-16: New Features ~~~~~~~~~~~~ - (`#12798 `__) Query-on-Batch now supports ``BlockMatrix.write(..., stage_locally=True)``. - (`#12793 `__) Query-on-Batch now supports ``hl.poisson_regression_rows``. - (`#12801 `__) Hitting CTRL-C while interactively using Query-on-Batch cancels the underlying batch. - (`#12810 `__) ``hl.array`` can now convert 1-d ndarrays into the equivalent list. - (`#12851 `__) ``hl.variant_qc`` no longer requires a locus field. - (`#12816 `__) In Query-on-Batch, ``hl.logistic_regression('firth', ...)`` is now supported. - (`#12854 `__) In Query-on-Batch, simple pipelines with large numbers of partitions should be substantially faster. .. _bug-fixes-16: Bug Fixes ~~~~~~~~~ - (`#12783 `__) Fixed bug where logs were not properly transmitted to Python. - (`#12812 `__) Fixed bug where ``Table/MT._calculate_new_partitions`` returned unbalanced intervals with whole-stage code generation runtime. - (`#12839 `__) Fixed ``hailctl dataproc`` jupyter notebooks to be compatible with Spark 3.3, which have been broken since 0.2.110. - (`#12855 `__) In Query-on-Batch, allow writing to requester pays buckets, which was broken before this release. -------------- Version 0.2.112 --------------- Released 2023-03-15 .. _bug-fixes-17: Bug Fixes ~~~~~~~~~ - (`#12784 `__) Removed an internal caching mechanism in Query on Batch that caused stalls in pipelines with large intermediates -------------- Version 0.2.111 --------------- Released 2023-03-13 .. _new-features-17: New Features ~~~~~~~~~~~~ - (`#12581 `__) In Query on Batch, users can specify which regions to have jobs run in. .. _bug-fixes-18: Bug Fixes ~~~~~~~~~ - (`#12772 `__) Fix ``hailctl hdinsight submit`` to pass args to the files -------------- Version 0.2.110 --------------- Released 2023-03-08 .. _new-features-18: New Features ~~~~~~~~~~~~ - (`#12643 `__) In Query on Batch, ``hl.skat(..., logistic=True)`` is now supported. - (`#12643 `__) In Query on Batch, ``hl.liftover`` is now supported. - (`#12629 `__) In Query on Batch, ``hl.ibd`` is now supported. - (`#12722 `__) Add ``hl.simulate_random_mating`` to generate a population from founders under the assumption of random mating. - (`#12701 `__) Query on Spark now officially supports Spark 3.3.0 and Dataproc 2.1.x Performance Improvements ~~~~~~~~~~~~~~~~~~~~~~~~ - (`#12679 `__) In Query on Batch, ``hl.balding_nichols_model`` is slightly faster. Also added ``hl.utils.genomic_range_table`` to quickly create a table keyed by locus. .. _bug-fixes-19: Bug Fixes ~~~~~~~~~ - (`#12711 `__) In Query on Batch, fix null pointer exception (manifesting as ``scala.MatchError: null``) when reading data from requester pays buckets. - (`#12739 `__) Fix ``hl.plot.cdf``, ``hl.plot.pdf``, and ``hl.plot.joint_plot`` which were broken by changes in Hail and changes in bokeh. - (`#12735 `__) Fix (`#11738 `__) by allowing user to override default types in ``to_pandas``. - (`#12760 `__) Mitigate some JVM bytecode generation errors, particularly those related to too many method parameters. - (`#12766 `__) Fix (`#12759 `__) by loosening ``parsimonious`` dependency pin. - (`#12732 `__) In Query on Batch, fix bug that sometimes prevented terminating a pipeline using Control-C. - (`#12771 `__) Use a version of ``jgscm`` whose version complies with PEP 440. -------------- Version 0.2.109 --------------- Released 2023-02-08 .. _new-features-19: New Features ~~~~~~~~~~~~ - (`#12605 `__) Add ``hl.pgenchisq`` the cumulative distribution function of the generalized chi-squared distribution. - (`#12637 `__) Query-on-Batch now supports ``hl.skat(..., logistic=False)``. - (`#12645 `__) Added ``hl.vds.truncate_reference_blocks`` to transform a VDS to checkpoint reference blocks in order to drastically improve interval filtering performance. Also added ``hl.vds.merge_reference_blocks`` to merge adjacent reference blocks according to user criteria to better compress reference data. .. _bug-fixes-20: Bug Fixes ~~~~~~~~~ - (`#12650 `__) Hail will now throw an exception on ``hl.export_bgen`` when there is no GP field, instead of exporting null records. - (`#12635 `__) Fix bug where ``hl.skat`` did not work on Apple M1 machines. - (`#12571 `__) When using Query-on-Batch, hl.hadoop\* methods now properly support creation and modification time. - (`#12566 `__) Improve error message when combining incompatibly indexed fields in certain operations including array indexing. -------------- Version 0.2.108 --------------- Released 2023-1-12 .. _new-features-20: New Features ~~~~~~~~~~~~ - (`#12576 `__) ``hl.import_bgen`` and ``hl.export_bgen`` now support compression with Zstd. .. _bug-fixes-21: Bug fixes ~~~~~~~~~ - (`#12585 `__) ``hail.ggplot``\ s that have more than one legend group or facet are now interactive. If such a plot has enough legend entries that the legend would be taller than the plot, the legend will now be scrollable. Legend entries for such plots can be clicked to show/hide traces on the plot, but this does not work and is a known issue that will only be addressed if ``hail.ggplot`` is migrated off of plotly. - (`#12584 `__) Fixed bug which arose as an assertion error about type mismatches. This was usually triggered when working with tuples. - (`#12583 `__) Fixed bug which showed an empty table for ``ht.col_key.show()``. - (`#12582 `__) Fixed bug where matrix tables with duplicate col keys do not show properly. Also fixed bug where tables and matrix tables with HTML unsafe column headers are rendered wrong in Jupyter. - (`#12574 `__) Fixed a memory leak when processing tables. Could trigger unnecessarily high memory use and out of memory errors when there are many rows per partition or large key fields. - (`#12565 `__) Fixed a bug that prevented exploding on a field of a Table whose value is a random value. -------------- Version 0.2.107 --------------- Released 2022-12-14 .. _bug-fixes-22: Bug fixes ~~~~~~~~~ - (`#12543 `__) Fixed ``hl.vds.local_to_global`` error when LA array contains non-ascending allele indices. -------------- Version 0.2.106 --------------- Released 2022-12-13 .. _new-features-21: New Features ~~~~~~~~~~~~ - (`#12522 `__) Added ``hailctl`` config setting ``'batch/backend'`` to specify the default backend to use in batch scripts when not specified in code. - (`#12497 `__) Added support for ``scales``, ``nrow``, and ``ncol`` arguments, as well as grouped legends, to ``hail.ggplot.facet_wrap``. - (`#12471 `__) Added ``hailctl batch submit`` command to run local scripts inside batch jobs. - (`#12525 `__) Add support for passing arguments to ``hailctl batch submit``. - (`#12465 `__) Batch jobs’ status now contains the region the job ran in. The job itself can access which region it is in through the ``HAIL_REGION`` environment variable. - (`#12464 `__) When using Query-on-Batch, all jobs for a single hail session are inserted into the same batch instead of one batch per action. - (`#12457 `__) ``pca`` and ``hwe_normalized_pca`` are now supported in Query-on-Batch. - (`#12376 `__) Added ``hail.query_table`` function for reading tables with indices from Python. - (`#12139 `__) Random number generation has been updated, but shouldn’t affect most users. If you need to manually set seeds, see https://hail.is/docs/0.2/functions/random.html for details. - (`#11884 `__) Added ``Job.always_copy_output`` when using the ``ServiceBackend``. The default behavior is ``False``, which is a breaking change from the previous behavior to always copy output files regardless of the job’s completion state. - (`#12139 `__) Brand new random number generation, shouldn’t affect most users. If you need to manually set seeds, see https://hail.is/docs/0.2/functions/random.html for details. .. _bug-fixes-23: Bug Fixes ~~~~~~~~~ - (`#12487 `__) Fixed a bug causing rare but deterministic job failures deserializing data in Query-on-Batch. - (`#12535 `__) QoB will now error if the user reads from and writes to the same path. QoB also now respects the user’s configuration of ``disable_progress_bar``. When ``disable_progress_bar`` is unspecified, QoB only disables the progress bar for non-interactive sessions. - (`#12517 `__) Fix a performance regression that appears when using ``hl.split_multi_hts`` among other methods. -------------- Version 0.2.105 --------------- Released 2022-10-31 🎃 .. _new-features-22: New Features ~~~~~~~~~~~~ - (`#12293 `__) Added support for ``hail.MatrixTable``\ s to ``hail.ggplot``. .. _bug-fixes-24: Bug Fixes ~~~~~~~~~ - (`#12384 `__) Fixed a critical bug that disabled tree aggregation and scan executions in 0.2.104, leading to out-of-memory errors. - (`#12265 `__) Fix long-standing bug wherein ``hl.agg.collect_as_set`` and ``hl.agg.counter`` error when applied to types which, in Python, are unhashable. For example, ``hl.agg.counter(t.list_of_genes)`` will not error when ``t.list_of_genes`` is a list. Instead, the counter dictionary will use ``FrozenList`` keys from the ``frozenlist`` package. -------------- Version 0.2.104 --------------- Release 2022-10-19 .. _new-features-23: New Features ~~~~~~~~~~~~ - (`#12346 `__): Introduced new progress bars which include total time elapsed and look cool. -------------- Version 0.2.103 --------------- Release 2022-10-18 .. _bug-fixes-25: Bug Fixes ~~~~~~~~~ - (`#12305 `__): Fixed a rare crash reading tables/matrixtables with \_intervals -------------- Version 0.2.102 --------------- Released 2022-10-06 .. _new-features-24: New Features ~~~~~~~~~~~~ - (`#12218 `__) Missing values are now supported in primitive columns in ``Table.to_pandas``. - (`#12254 `__) Cross-product-style legends for data groups have been replaced with factored ones (consistent with ``ggplot2``\ ’s implementation) for ``hail.ggplot.geom_point``, and support has been added for custom legend group labels. - (`#12268 `__) ``VariantDataset`` now implements ``union_rows`` for combining datasets with the same samples but disjoint variants. .. _bug-fixes-26: Bug Fixes ~~~~~~~~~ - (`#12278 `__) Fixed bug made more likely by 0.2.101 in which Hail errors when interacting with a NumPy integer or floating point type. - (`#12277 `__) Fixed bug in reading tables/matrixtables with partition intervals that led to error or segfault. -------------- Version 0.2.101 --------------- Released 2022-10-04 .. _new-features-25: New Features ~~~~~~~~~~~~ - (`#12218 `__) Support missing values in primitive columns in ``Table.to_pandas``. - (`#12195 `__) Add a ``impute_sex_chr_ploidy_from_interval_coverage`` to impute sex ploidy directly from a coverage MT. - (`#12222 `__) Query-on-Batch pipelines now add worker jobs to the same batch as the driver job instead of producing a new batch per stage. - (`#12244 `__) Added support for custom labels for per-group legends to ``hail.ggplot.geom_point`` via the ``legend_format`` keyword argument .. _deprecations-6: Deprecations ~~~~~~~~~~~~ - (`#12230 `__) The python-dill Batch images in ``gcr.io/hail-vdc`` are no longer supported. Use ``hailgenetics/python-dill`` instead. .. _bug-fixes-27: Bug fixes ~~~~~~~~~ - (`#12215 `__) Fix search bar in the Hail Batch documentation. -------------- Version 0.2.100 --------------- Released 2022-09-23 .. _new-features-26: New Features ~~~~~~~~~~~~ - (`#12207 `__) Add support for the ``shape`` aesthetic to ``hail.ggplot.geom_point``. .. _deprecations-7: Deprecations ~~~~~~~~~~~~ - (`#12213 `__) The ``batch_size`` parameter of ``vds.new_combiner`` is deprecated in favor of ``gvcf_batch_size``. .. _bug-fixes-28: Bug fixes ~~~~~~~~~ - (`#12216 `__) Fix bug that caused ``make install-on-cluster`` to fail with a message about ``sys_platform``. - (`#12164 `__) Fix bug that caused Query on Batch pipelines to fail on datasets with indexes greater than 2GiB. -------------- Version 0.2.99 -------------- Released 2022-09-13 .. _new-features-27: New Features ~~~~~~~~~~~~ - (`#12091 `__) Teach ``Table`` to ``write_many``, which writes one table per provided field. - (`#12067 `__) Add ``rand_int32`` and ``rand_int64`` for generating random 32-bit and 64-bit integers, respectively. .. _performance-improvements-1: Performance Improvements ~~~~~~~~~~~~~~~~~~~~~~~~ - (`#12159 `__) Improve performance of MatrixTable reads when using ``_intervals`` argument .. _bug-fixes-29: Bug fixes ~~~~~~~~~ - (`#12179 `__) Fix incorrect composition of interval filters with unordered interval lists that could lead to over- or under-filtering. - (`#12162 `__) Fixed crash in ``collect_cols_by_key`` with preceding random functions. -------------- Version 0.2.98 -------------- Released 2022-08-22 .. _new-features-28: New Features ~~~~~~~~~~~~ - (`#12062 `__) ``hl.balding_nichols_model`` now supports an optional boolean parameter, ``phased``, to control the phasedness of the generated genotypes. .. _performance-improvements-2: Performance improvements ~~~~~~~~~~~~~~~~~~~~~~~~ - (`#12099 `__) Make repeated VCF/PLINK queries much faster by caching compiler data structures. - (`#12038 `__) Speed up ``hl.import_matrix_table`` by caching header line computation. .. _bug-fixes-30: Bug fixes ~~~~~~~~~ - (`#12115 `__) When using ``use_new_shuffle=True``, fix a bug when there are more than 2^31 rows - (`#12074 `__) Fix bug where ``hl.init`` could silently overwrite the global random seed. - (`#12079 `__) Fix bug in handling of missing (aka NA) fields in grouped aggregation and distinct by key. - (`#12056 `__) Fix ``hl.export_vcf`` to actually create tabix files when requested. - (`#12020 `__) Fix bug in ``hl.experimental.densify`` which manifested as an ``AssertionError`` about dtypes. -------------- Version 0.2.97 -------------- Released 2022-06-30 .. _new-features-29: New Features ~~~~~~~~~~~~ - (`#11756 `__) ``hb.BatchPoolExecutor`` and Python jobs both now also support async functions. .. _bug-fixes-31: Bug fixes ~~~~~~~~~ - (`#11962 `__) Fix error (logged as (`#11891 `__)) in VCF combiner when exactly 10 or 100 files are combined. - (`#11969 `__) Fix ``import_table`` and ``import_lines`` to use multiple partitions when ``force_bgz`` is used. - (`#11964 `__) Fix erroneous “Bucket is a requester pays bucket but no user project provided.” errors in Google Dataproc by updating to the latest Dataproc image version. -------------- Version 0.2.96 -------------- Released 2022-06-21 .. _new-features-30: New Features ~~~~~~~~~~~~ - (`#11833 `__) ``hl.rand_unif`` now has default arguments of 0.0 and 1.0 .. _bug-fixes-32: Bug fixes ~~~~~~~~~ - (`#11905 `__) Fix erroneous FileNotFoundError in glob patterns - (`#11921 `__) and (`#11910 `__) Fix file clobbering during text export with speculative execution. - (`#11920 `__) Fix array out of bounds error when tree aggregating a multiple of 50 partitions. - (`#11937 `__) Fixed correctness bug in scan order for ``Table.annotate`` and ``MatrixTable.annotate_rows`` in certain circumstances. - (`#11887 `__) Escape VCF description strings when exporting. - (`#11886 `__) Fix an error in an example in the docs for ``hl.split_multi``. -------------- Version 0.2.95 -------------- Released 2022-05-13 .. _new-features-31: New features ~~~~~~~~~~~~ - (`#11809 `__) Export ``dtypes_from_pandas`` in ``expr.types`` - (`#11807 `__) Teach smoothed_pdf to add a plot to an existing figure. - (`#11746 `__) The ServiceBackend, in interactive mode, will print a link to the currently executing driver batch. - (`#11759 `__) ``hl.logistic_regression_rows``, ``hl.poisson_regression_rows``, and ``hl.skat`` all now support configuration of the maximum number of iterations and the tolerance. - (`#11835 `__) Add ``hl.ggplot.geom_density`` which renders a plot of an approximation of the probability density function of its argument. .. _bug-fixes-33: Bug fixes ~~~~~~~~~ - (`#11815 `__) Fix incorrectly missing entries in to_dense_mt at the position of ref block END. - (`#11828 `__) Fix ``hl.init`` to not ignore its ``sc`` argument. This bug was introduced in 0.2.94. - (`#11830 `__) Fix an error and relax a timeout which caused ``hailtop.aiotools.copy`` to hang. - (`#11778 `__) Fix a (different) error which could cause hangs in ``hailtop.aiotools.copy``. -------------- Version 0.2.94 -------------- Released 2022-04-26 Deprecation ~~~~~~~~~~~ - (`#11765 `__) Deprecated and removed linear mixed model functionality. Beta features ~~~~~~~~~~~~~ - (`#11782 `__) ``hl.import_table`` is up to twice as fast for small tables. .. _new-features-32: New features ~~~~~~~~~~~~ - (`#11428 `__) ``hailtop.batch.build_python_image`` now accepts a ``show_docker_output`` argument to toggle printing docker’s output to the terminal while building container images - (`#11725 `__) ``hl.ggplot`` now supports ``facet_wrap`` - (`#11776 `__) ``hailtop.aiotools.copy`` will always show a progress bar when ``--verbose`` is passed. ``hailctl dataproc`` ~~~~~~~~~~~~~~~~~~~~ - (`#11710 `__) support pass-through arguments to ``connect`` .. _bug-fixes-34: Bug fixes ~~~~~~~~~ - (`#11792 `__) Resolved issue where corrupted tables could be created with whole-stage code generation enabled. -------------- Version 0.2.93 -------------- Release 2022-03-27 .. _beta-features-1: Beta features ~~~~~~~~~~~~~ - Several issues with the beta version of Hail Query on Hail Batch are addressed in this release. -------------- Version 0.2.92 -------------- Release 2022-03-25 .. _new-features-33: New features ~~~~~~~~~~~~ - (`#11613 `__) Add ``hl.ggplot`` support for ``scale_fill_hue``, ``scale_color_hue``, and ``scale_fill_manual``, ``scale_color_manual``. This allows for an infinite number of discrete colors. - (`#11608 `__) Add all remaining and all versions of extant public gnomAD datasets to the Hail Annotation Database and Datasets API. Current as of March 23rd 2022. - (`#11662 `__) Add the ``weight`` aesthetic ``geom_bar``. .. _beta-features-2: Beta features ~~~~~~~~~~~~~ - This version of Hail includes all the necessary client-side infrastructure to execute Hail Query pipelines on a Hail Batch cluster. This effectively enables a “serverless” version of Hail Query which is independent of Apache Spark. Broad affiliated users should contact the Hail team for help using Hail Query on Hail Batch. Unaffiliated users should also contact the Hail team to discuss the feasibility of running your own Hail Batch cluster. The Hail team is accessible at both https://hail.zulipchat.com and https://discuss.hail.is . -------------- Version 0.2.91 -------------- Release 2022-03-18 .. _bug-fixes-35: Bug fixes ~~~~~~~~~ - (`#11614 `__) Update ``hail.utils.tutorial.get_movie_lens`` to use ``https`` instead of ``http``. Movie Lens has stopped serving data over insecure HTTP. - (`#11563 `__) Fix issue `hail-is/hail#11562 `__. - (`#11611 `__) Fix a bug that prevents the display of ``hl.ggplot.geom_hline`` and ``hl.ggplot.geom_vline``. -------------- Version 0.2.90 -------------- Release 2022-03-11 Critical BlockMatrix from_numpy correctness bug ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - (`#11555 `__) ``BlockMatrix.from_numpy`` did not work correctly. Version 1.0 of org.scalanlp.breeze, a dependency of Apache Spark that hail also depends on, has a correctness bug that results in BlockMatrices that repeat the top left block of the block matrix for every block. This affected anyone running Spark 3.0.x or 3.1.x. .. _bug-fixes-36: Bug fixes ~~~~~~~~~ - (`#11556 `__) Fixed assertion error ocassionally being thrown by valid joins where the join key was a prefix of the left key. Versioning ~~~~~~~~~~ - (`#11551 `__) Support Python 3.10. -------------- Version 0.2.89 -------------- Release 2022-03-04 - (`#11452 `__) Fix ``impute_sex_chromosome_ploidy`` docs. -------------- Version 0.2.88 -------------- Release 2022-03-01 This release addresses the deploy issues in the 0.2.87 release of Hail. -------------- Version 0.2.87 -------------- Release 2022-02-28 An error in the deploy process required us to yank this release from PyPI. Please do not use this release. .. _bug-fixes-37: Bug fixes ~~~~~~~~~ - (`#11401 `__) Fixed bug where ``from_pandas`` didn’t support missing strings. -------------- Version 0.2.86 -------------- Release 2022-02-25 .. _bug-fixes-38: Bug fixes ~~~~~~~~~ - (`#11374 `__) Fixed bug where certain pipelines that read in PLINK files would give assertion error. - (`#11401 `__) Fixed bug where ``from_pandas`` didn’t support missing ints. .. _performance-improvements-3: Performance improvements ~~~~~~~~~~~~~~~~~~~~~~~~ - (`#11306 `__) Newly written tables that have no duplicate keys will be faster to join against. -------------- Version 0.2.85 -------------- Release 2022-02-14 .. _bug-fixes-39: Bug fixes ~~~~~~~~~ - (`#11355 `__) Fixed assertion errors being hit relating to RVDPartitioner. - (`#11344 `__) Fix error where hail ggplot would mislabel points after more than 10 distinct colors were used. .. _new-features-34: New features ~~~~~~~~~~~~ - (`#11332 `__) Added ``geom_ribbon`` and ``geom_area`` to hail ggplot. -------------- Version 0.2.84 -------------- Release 2022-02-10 .. _bug-fixes-40: Bug fixes ~~~~~~~~~ - (`#11328 `__) Fix bug where occasionally files written to disk would be unreadable. - (`#11331 `__) Fix bug that potentially caused files written to disk to be unreadable. - (`#11312 `__) Fix aggregator memory leak. - (`#11340 `__) Fix bug where repeatedly annotating same field name could cause failure to compile. - (`#11342 `__) Fix to possible issues about having too many open file handles. .. _new-features-35: New features ~~~~~~~~~~~~ - (`#11300 `__) ``geom_histogram`` infers min and max values automatically. - (`#11317 `__) Add support for ``alpha`` aesthetic and ``identity`` position to ``geom_histogram``. -------------- Version 0.2.83 -------------- Release 2022-02-01 .. _bug-fixes-41: Bug fixes ~~~~~~~~~ - (`#11268 `__) Fixed ``log`` argument in ``hail.plot.histogram``. - (`#11276 `__) Fixed ``log`` argument in ``hail.plot.pdf``. - (`#11256 `__) Fixed memory leak in LD Prune. .. _new-features-36: New features ~~~~~~~~~~~~ - (`#11274 `__) Added ``geom_col`` to ``hail.ggplot``. .. _hailctl-dataproc-1: hailctl dataproc ~~~~~~~~~~~~~~~~ - (`#11280 `__) Updated dataproc image version to one not affected by log4j vulnerabilities. -------------- Version 0.2.82 -------------- Release 2022-01-24 .. _bug-fixes-42: Bug fixes ~~~~~~~~~ - (`#11209 `__) Significantly improved usefulness and speed of ``Table.to_pandas``, resolved several bugs with output. .. _new-features-37: New features ~~~~~~~~~~~~ - (`#11247 `__) Introduces a new experimental plotting interface ``hail.ggplot``, based on R’s ggplot library. - (`#11173 `__) Many math functions like ``hail.sqrt`` now automatically broadcast over ndarrays. .. _performance-improvements-4: Performance Improvements ~~~~~~~~~~~~~~~~~~~~~~~~ - (`#11216 `__) Significantly improve performance of ``parse_locus_interval`` Python and Java Support ~~~~~~~~~~~~~~~~~~~~~~~ - (`#11219 `__) We no longer officially support Python 3.6, though it may continue to work in the short term. - (`#11220 `__) We support building hail with Java 11. .. _file-format-1: File Format ~~~~~~~~~~~ - The native file format version is now 1.6.0. Older versions of Hail will not be able to read tables or matrix tables written by this version of Hail. -------------- Version 0.2.81 -------------- Release 2021-12-20 .. _hailctl-dataproc-2: hailctl dataproc ~~~~~~~~~~~~~~~~ - (`#11182 `__) Updated Dataproc image version to mitigate yet more Log4j vulnerabilities. -------------- Version 0.2.80 -------------- Release 2021-12-15 .. _new-features-38: New features ~~~~~~~~~~~~ - (`#11077 `__) ``hl.experimental.write_matrix_tables`` now returns the paths of the written matrix tables. .. _hailctl-dataproc-3: hailctl dataproc ~~~~~~~~~~~~~~~~ - (`#11157 `__) Updated Dataproc image version to mitigate the Log4j vulnerability. - (`#10900 `__) Added ``--region`` parameter to ``hailctl dataproc submit``. - (`#11090 `__) Teach ``hailctl dataproc describe`` how to read URLs with the protocols ``s3`` (Amazon S3), ``hail-az`` (Azure Blob Storage), and ``file`` (local file system) in addition to ``gs`` (Google Cloud Storage). -------------- Version 0.2.79 -------------- Release 2021-11-17 .. _bug-fixes-43: Bug fixes ~~~~~~~~~ - (`#11023 `__) Fixed bug in call decoding that was introduced in version 0.2.78. .. _new-features-39: New features ~~~~~~~~~~~~ - (`#10993 `__) New function ``p_value_excess_het``. -------------- Version 0.2.78 -------------- Release 2021-10-19 .. _bug-fixes-44: Bug fixes ~~~~~~~~~ - (`#10766 `__) Don’t throw out of memory error when broadcasting more than 2^(31) - 1 bytes. - (`#10910 `__) Filters on key field won’t be slowed down by uses of ``MatrixTable.localize_entries`` or ``Table.rename``. - (`#10959 `__) Don’t throw an error in certain situations where some key fields are optimized away. .. _new-features-40: New features ~~~~~~~~~~~~ - (`#10855 `__) Arbitrary aggregations can be implemented using ``hl.agg.fold``. .. _performance-improvements-5: Performance Improvements ~~~~~~~~~~~~~~~~~~~~~~~~ - (`#10971 `__) Substantially improve the speed of ``Table.collect`` when collecting large amounts of data. -------------- Version 0.2.77 -------------- Release 2021-09-21 .. _bug-fixes-45: Bug fixes ~~~~~~~~~ - (`#10888 `__) Fix crash when calling ``hl.liftover``. - (`#10883 `__) Fix crash / long compilation times writing matrix tables with many partitions. -------------- Version 0.2.76 -------------- Released 2021-09-15 .. _bug-fixes-46: Bug fixes ~~~~~~~~~ - (`#10872 `__) Fix long compile times or method size errors when writing tables with many partitions - (`#10878 `__) Fix crash importing or sorting tables with empty data partitions -------------- Version 0.2.75 -------------- Released 2021-09-10 .. _bug-fixes-47: Bug fixes ~~~~~~~~~ - (`#10733 `__) Fix a bug in tabix parsing when the size of the list of all sequences is large. - (`#10765 `__) Fix rare bug where valid pipelines would fail to compile if intervals were created conditionally. - (`#10746 `__) Various compiler improvements, decrease likelihood of ``ClassTooLarge`` errors. - (`#10829 `__) Fix a bug where ``hl.missing`` and ``CaseBuilder.or_error`` failed if their type was a struct containing a field starting with a number. .. _new-features-41: New features ~~~~~~~~~~~~ - (`#10768 `__) Support multiplying ``StringExpression``\ s to repeat them, as with normal python strings. .. _performance-improvements-6: Performance improvements ~~~~~~~~~~~~~~~~~~~~~~~~ - (`#10625 `__) Reduced need to copy strings around, pipelines with many string operations should get faster. - (`#10775 `__) Improved performance of ``to_matrix_table_row_major`` on both ``BlockMatrix`` and ``Table``. -------------- Version 0.2.74 -------------- Released 2021-07-26 .. _bug-fixes-48: Bug fixes ~~~~~~~~~ - (`#10697 `__) Fixed bug in ``read_table`` when the table has missing keys and ``_n_partitions`` is specified. - (`#10695 `__) Fixed bug in hl.experimental.loop causing incorrect results when loop state contained pointers. -------------- Version 0.2.73 -------------- Released 2021-07-22 .. _bug-fixes-49: Bug fixes ~~~~~~~~~ - (`#10684 `__) Fixed a rare bug reading arrays from disk where short arrays would have their first elements corrupted and long arrays would cause segfaults. - (`#10523 `__) Fixed bug where liftover would fail with “Could not initialize class” errors. -------------- Version 0.2.72 -------------- Released 2021-07-19 .. _new-features-42: New Features ~~~~~~~~~~~~ - (`#10655 `__) Revamped many hail error messages to give useful python stack traces. - (`#10663 `__) Added ``DictExpression.items()`` to mirror python’s ``dict.items()``. - (`#10657 `__) ``hl.map`` now supports mapping over multiple lists like Python’s built-in ``map``. .. _bug-fixes-50: Bug fixes ~~~~~~~~~ - (`#10662 `__) Fixed partitioning logic in ``hl.import_plink``. - (`#10669 `__) ``NDArrayNumericExpression.sum()`` now works correctly on ndarrays of booleans. -------------- Version 0.2.71 -------------- Released 2021-07-08 .. _new-features-43: New Features ~~~~~~~~~~~~ - (`#10632 `__) Added support for weighted linear regression to ``hl.linear_regression_rows``. - (`#10635 `__) Added ``hl.nd.maximum`` and ``hl.nd.minimum``. - (`#10602 `__) Added ``hl.starmap``. .. _bug-fixes-51: Bug fixes ~~~~~~~~~ - (`#10038 `__) Fixed crashes when writing/reading matrix tables with 0 partitions. - (`#10624 `__) Fixed out of bounds bug with ``_quantile_from_cdf``. .. _hailctl-dataproc-4: hailctl dataproc ~~~~~~~~~~~~~~~~ - (`#10633 `__) Added ``--scopes`` parameter to ``hailctl dataproc start``. -------------- Version 0.2.70 -------------- Released 2021-06-21 -------------- Version 0.2.69 -------------- Released 2021-06-14 .. _new-features-44: New Features ~~~~~~~~~~~~ - (`#10592 `__) Added ``hl.get_hgdp`` function. - (`#10555 `__) Added ``hl.hadoop_scheme_supported`` function. - (`#10551 `__) Indexing ndarrays now supports ellipses. .. _bug-fixes-52: Bug fixes ~~~~~~~~~ - (`#10553 `__) Dividing two integers now returns a ``float64``, not a ``float32``. - (`#10595 `__) Don’t include nans in ``lambda_gc_agg``. .. _hailctl-dataproc-5: hailctl dataproc ~~~~~~~~~~~~~~~~ - (`#10574 `__) Hail logs will now be stored in ``/home/hail`` by default. -------------- Version 0.2.68 -------------- Released 2021-05-27 -------------- Version 0.2.67 -------------- Critical performance fix ~~~~~~~~~~~~~~~~~~~~~~~~ Released 2021-05-06 - (`#10451 `__) Fixed a memory leak / performance bug triggered by ``hl.literal(...).contains(...)`` -------------- Version 0.2.66 -------------- Released 2021-05-03 .. _new-features-45: New features ~~~~~~~~~~~~ - (`#10398 `__) Added new method ``BlockMatrix.to_ndarray``. - (`#10251 `__) Added suport for haploid GT calls to VCF combiner. -------------- Version 0.2.65 -------------- Released 2021-04-14 Default Spark Version Change ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Starting from version 0.2.65, Hail uses Spark 3.1.1 by default. This will also allow the use of all python versions >= 3.6. By building hail from source, it is still possible to use older versions of Spark. .. _new-features-46: New features ~~~~~~~~~~~~ - (`#10290 `__) Added ``hl.nd.solve``. - (`#10187 `__) Added ``NDArrayNumericExpression.sum``. .. _performance-improvements-7: Performance improvements ~~~~~~~~~~~~~~~~~~~~~~~~ - (`#10233 `__) Loops created with ``hl.experimental.loop`` will now clean up unneeded memory between iterations. .. _bug-fixes-53: Bug fixes ~~~~~~~~~ - (`#10227 `__) ``hl.nd.qr`` now supports ndarrays that have 0 rows or columns. -------------- Version 0.2.64 -------------- Released 2021-03-11 .. _new-features-47: New features ~~~~~~~~~~~~ - (`#10164 `__) Add source_file_field parameter to hl.import_table to allow lines to be associated with their original source file. .. _bug-fixes-54: Bug fixes ~~~~~~~~~ - (`#10182 `__) Fixed serious memory leak in certain uses of ``filter_intervals``. - (`#10133 `__) Fix bug where some pipelines incorrectly infer missingness, leading to a type error. - (`#10134 `__) Teach ``hl.king`` to treat filtered entries as missing values. - (`#10158 `__) Fixes hail usage in latest versions of jupyter that rely on ``asyncio``. - (`#10174 `__) Fixed bad error message when incorrect return type specified with ``hl.loop``. -------------- Version 0.2.63 -------------- Released 2021-03-01 - (`#10105 `__) Hail will now return ``frozenset`` and ``hail.utils.frozendict`` instead of normal sets and dicts. .. _bug-fixes-55: Bug fixes ~~~~~~~~~ - (`#10035 `__) Fix mishandling of NaN values in ``hl.agg.hist``, where they were unintentionally included in the first bin. - (`#10007 `__) Improve error message from hadoop_ls when file does not exist. .. _performance-improvements-8: Performance Improvements ~~~~~~~~~~~~~~~~~~~~~~~~ - (`#10068 `__) Make certain array copies faster. - (`#10061 `__) Improve code generation of ``hl.if_else`` and ``hl.coalesce``. -------------- Version 0.2.62 -------------- Released 2021-02-03 .. _new-features-48: New features ~~~~~~~~~~~~ - (`#9936 `__) Deprecated ``hl.null`` in favor of ``hl.missing`` for naming consistency. - (`#9973 `__) ``hl.vep`` now includes a ``vep_proc_id`` field to aid in debugging unexpected output. - (`#9839 `__) Hail now eagerly deletes temporary files produced by some BlockMatrix operations. - (`#9835 `__) ``hl.any`` and ``hl.all`` now also support a single collection argument and a varargs of Boolean expressions. - (`#9816 `__) ``hl.pc_relate`` now includes values on the diagonal of kinship, IBD-0, IBD-1, and IBD-2 - (`#9736 `__) Let NDArrayExpression.reshape take varargs instead of mandating a tuple. - (`#9766 `__) ``hl.export_vcf`` now warns if INFO field names are invalid according to the VCF 4.3 spec. .. _bug-fixes-56: Bug fixes ~~~~~~~~~ - (`#9976 `__) Fixed ``show()`` representation of Hail dictionaries. .. _performance-improvements-9: Performance improvements ~~~~~~~~~~~~~~~~~~~~~~~~ - (`#9909 `__) Improved performance of ``hl.experimental.densify`` by approximately 35%. -------------- Version 0.2.61 -------------- Released 2020-12-03 .. _new-features-49: New features ~~~~~~~~~~~~ - (`#9749 `__) Add or_error method to SwitchBuilder (``hl.switch``) .. _bug-fixes-57: Bug fixes ~~~~~~~~~ - (`#9775 `__) Fixed race condition leading to invalid intermediate files in VCF combiner. - (`#9751 `__) Fix bug where constructing an array of empty structs causes type error. - (`#9731 `__) Fix error and incorrect behavior when using ``hl.import_matrix_table`` with int64 data types. -------------- Version 0.2.60 -------------- Released 2020-11-16 .. _new-features-50: New features ~~~~~~~~~~~~ - (`#9696 `__) ``hl.experimental.export_elasticsearch`` will now support Elasticsearch versions 6.8 - 7.x by default. .. _bug-fixes-58: Bug fixes ~~~~~~~~~ - (`#9641 `__) Showing hail ndarray data now always prints in correct order. .. _hailctl-dataproc-6: hailctl dataproc ~~~~~~~~~~~~~~~~ - (`#9610 `__) Support interval fields in ``hailctl dataproc describe`` -------------- Version 0.2.59 -------------- Released 2020-10-22 Datasets / Annotation DB ~~~~~~~~~~~~~~~~~~~~~~~~ - (`#9605 `__) The Datasets API and the Annotation Database now support AWS, and users are required to specify what cloud platform they’re using. .. _hailctl-dataproc-7: hailctl dataproc ~~~~~~~~~~~~~~~~ - (`#9609 `__) Fixed bug where ``hailctl dataproc modify`` did not correctly print corresponding ``gcloud`` command. -------------- Version 0.2.58 -------------- Released 2020-10-08 .. _new-features-51: New features ~~~~~~~~~~~~ - (`#9524 `__) Hail should now be buildable using Spark 3.0. - (`#9549 `__) Add ``ignore_in_sample_frequency`` flag to ``hl.de_novo``. - (`#9501 `__) Configurable cache size for ``BlockMatrix.to_matrix_table_row_major`` and ``BlockMatrix.to_table_row_major``. - (`#9474 `__) Add ``ArrayExpression.first`` and ``ArrayExpression.last``. - (`#9459 `__) Add ``StringExpression.join``, an analogue to Python’s ``str.join``. - (`#9398 `__) Hail will now throw ``HailUserError``\ s if the ``or_error`` branch of a ``CaseBuilder`` is hit. .. _bug-fixes-59: Bug fixes ~~~~~~~~~ - (`#9503 `__) NDArrays can now hold arbitrary data types, though only ndarrays of primitives can be collected to Python. - (`#9501 `__) Remove memory leak in ``BlockMatrix.to_matrix_table_row_major`` and ``BlockMatrix.to_table_row_major``. - (`#9424 `__) ``hl.experimental.writeBlockMatrices`` didn’t correctly support ``overwrite`` flag. .. _performance-improvements-10: Performance improvements ~~~~~~~~~~~~~~~~~~~~~~~~ - (`#9506 `__) ``hl.agg.ndarray_sum`` will now do a tree aggregation. .. _hailctl-dataproc-8: hailctl dataproc ~~~~~~~~~~~~~~~~ - (`#9502 `__) Fix hailctl dataproc modify to install dependencies of the wheel file. - (`#9420 `__) Add ``--debug-mode`` flag to ``hailctl dataproc start``. This will enable heap dumps on OOM errors. - (`#9520 `__) Add support for requester pays buckets to ``hailctl dataproc describe``. .. _deprecations-8: Deprecations ~~~~~~~~~~~~ - (`#9482 `__) ``ArrayExpression.head`` has been deprecated in favor of ``ArrayExpression.first``. -------------- Version 0.2.57 -------------- Released 2020-09-03 .. _new-features-52: New features ~~~~~~~~~~~~ - (`#9343 `__) Implement the KING method for relationship inference as ``hl.methods.king``. -------------- Version 0.2.56 -------------- Released 2020-08-31 .. _new-features-53: New features ~~~~~~~~~~~~ - (`#9308 `__) Add hl.enumerate in favor of hl.zip_with_index, which is now deprecated. - (`#9278 `__) Add ``ArrayExpression.grouped``, a function that groups hail arrays into fixed size subarrays. Performance ~~~~~~~~~~~ - (`#9373 `__)(`#9374 `__) Decrease amount of memory used when slicing or filtering along a single BlockMatrix dimension. .. _bug-fixes-60: Bug fixes ~~~~~~~~~ - (`#9304 `__) Fix crash in ``run_combiner`` caused by inputs where VCF lines and BGZ blocks align. .. _hailctl-dataproc-9: hailctl dataproc ~~~~~~~~~~~~~~~~ - (`#9263 `__) Add support for ``--expiration-time`` argument to ``hailctl dataproc start``. - (`#9263 `__) Add support for ``--no-max-idle``, ``no-max-age``, ``--max-age``, and ``--expiration-time`` to ``hailctl dataproc --modify``. -------------- Version 0.2.55 -------------- Released 2020-08-19 .. _performance-1: Performance ~~~~~~~~~~~ - (`#9264 `__) Table.checkpoint now uses a faster LZ4 compression scheme. .. _bug-fixes-61: Bug fixes ~~~~~~~~~ - (`#9250 `__) ``hailctl dataproc`` no longer uses deprecated ``gcloud`` flags. Consequently, users must update to a recent version of ``gcloud``. - (`#9294 `__) The “Python 3” kernel in notebooks in clusters started by ``hailctl dataproc`` now features the same Spark monitoring widget found in the “Hail” kernel. There is now no reason to use the “Hail” kernel. .. _file-format-2: File Format ~~~~~~~~~~~ - The native file format version is now 1.5.0. Older versions of Hail will not be able to read tables or matrix tables written by this version of Hail. -------------- Version 0.2.54 -------------- Released 2020-08-07 VCF Combiner ~~~~~~~~~~~~ - (`#9224 `__)(`#9237 `__) **Breaking change**: Users are now required to pass a partitioning argument to the command-line interface or ``run_combiner`` method. See documentation for details. - (`#8963 `__) Improved performance of VCF combiner by ~4x. .. _new-features-54: New features ~~~~~~~~~~~~ - (`#9209 `__) Add ``hl.agg.ndarray_sum`` aggregator. .. _bug-fixes-62: Bug fixes ~~~~~~~~~ - (`#9206 `__)(`#9207 `__) Improved error messages from invalid usages of Hail expressions. - (`#9223 `__) Fixed error in bounds checking for NDArray slicing. -------------- Version 0.2.53 -------------- Released 2020-07-30 .. _bug-fixes-63: Bug fixes ~~~~~~~~~ - (`#9173 `__) Use less confusing column key behavior in MT.show. - (`#9172 `__) Add a missing Python dependency to Hail: google-cloud-storage. - (`#9170 `__) Change Hail tree aggregate depth logic to correctly respect the branching factor set in ``hl.init``. -------------- Version 0.2.52 -------------- Released 2020-07-29 .. _bug-fixes-64: Bug fixes ~~~~~~~~~ - (`#8944 `__)(`#9169 `__) Fixed crash (error 134 or SIGSEGV) in ``MatrixTable.annotate_cols``, ``hl.sample_qc``, and more. -------------- Version 0.2.51 -------------- Released 2020-07-28 .. _bug-fixes-65: Bug fixes ~~~~~~~~~ - (`#9161 `__) Fix bug that prevented concatenating ndarrays that are fields of a table. - (`#9152 `__) Fix bounds in NDArray slicing. - (`#9161 `__) Fix bugs calculating *row_id* in ``hl.import_matrix_table``. -------------- Version 0.2.50 -------------- Released 2020-07-23 .. _bug-fixes-66: Bug fixes ~~~~~~~~~ - (`#9114 `__) CHANGELOG: Fixed crash when using repeated calls to ``hl.filter_intervals``. .. _new-features-55: New features ~~~~~~~~~~~~ - (`#9101 `__) Add ``hl.nd.{concat, hstack, vstack}`` to concatenate ndarrays. - (`#9105 `__) Add ``hl.nd.{eye, identity}`` to create identity matrix ndarrays. - (`#9093 `__) Add ``hl.nd.inv`` to invert ndarrays. - (`#9063 `__) Add ``BlockMatrix.tree_matmul`` to improve matrix multiply performance with a large inner dimension. -------------- Version 0.2.49 -------------- Released 2020-07-08 .. _bug-fixes-67: Bug fixes ~~~~~~~~~ - (`#9058 `__) Fixed memory leak affecting ``Table.aggregate``, ``MatrixTable.annotate_cols`` aggregations, and ``hl.sample_qc``. -------------- Version 0.2.48 -------------- Released 2020-07-07 .. _bug-fixes-68: Bug fixes ~~~~~~~~~ - (`#9029 `__) Fix crash when using ``hl.agg.linreg`` with no aggregated data records. - (`#9028 `__) Fixed memory leak affecting ``Table.annotate`` with scans, ``hl.experimental.densify``, and ``Table.group_by`` / ``aggregate``. - (`#8978 `__) Fixed aggregation behavior of ``MatrixTable.{group_rows_by, group_cols_by}`` to skip filtered entries. -------------- Version 0.2.47 -------------- Released 2020-06-23 .. _bug-fixes-69: Bug fixes ~~~~~~~~~ - (`#9009 `__) Fix memory leak when counting per-partition. This caused excessive memory use in ``BlockMatrix.write_from_entry_expr``, and likely in many other places. - (`#9006 `__) Fix memory leak in ``hl.export_bgen``. - (`#9001 `__) Fix double close error that showed up on Azure Cloud. Version 0.2.46 -------------- Released 2020-06-17 Site ~~~~ - (`#8955 `__) Natural language documentation search .. _bug-fixes-70: Bug fixes ~~~~~~~~~ - (`#8981 `__) Fix BlockMatrix OOM triggered by the MatrixWriteBlockMatrix WriteBlocksRDD method -------------- Version 0.2.45 -------------- Release 2020-06-15 .. _bug-fixes-71: Bug fixes ~~~~~~~~~ - (`#8948 `__) Fix integer overflow error when reading files >2G with ``hl.import_plink``. - (`#8903 `__) Fix Python type annotations for empty collection constructors and ``hl.shuffle``. - (`#8942 `__) Refactored VCF combiner to support other GVCF schemas. - (`#8941 `__) Fixed ``hl.import_plink`` with multiple data partitions. .. _hailctl-dataproc-10: hailctl dataproc ~~~~~~~~~~~~~~~~ - (`#8946 `__) Fix bug when a user specifies packages in ``hailctl dataproc start`` that are also dependencies of the Hail package. - (`#8939 `__) Support tuples in ``hailctl dataproc describe``. -------------- Version 0.2.44 -------------- Release 2020-06-06 .. _new-features-56: New Features ~~~~~~~~~~~~ - (`#8914 `__) ``hl.export_vcf`` can now export tables as sites-only VCFs. - (`#8894 `__) Added ``hl.shuffle`` function to randomly permute arrays. - (`#8854 `__) Add ``composable`` option to parallel text export for use with ``gsutil compose``. .. _bug-fixes-72: Bug fixes ~~~~~~~~~ - (`#8883 `__) Fix an issue related to failures in pipelines with ``force_bgz=True``. .. _performance-2: Performance ~~~~~~~~~~~ - (`#8887 `__) Substantially improve the performance of ``hl.experimental.import_gtf``. -------------- Version 0.2.43 -------------- Released 2020-05-28 .. _bug-fixes-73: Bug fixes ~~~~~~~~~ - (`#8867 `__) Fix a major correctness bug ocurring when calling BlockMatrix.transpose on sparse, non-symmetric BlockMatrices. - (`#8876 `__) Fixed “ChannelClosedException: null” in ``{Table, MatrixTable}.write``. -------------- Version 0.2.42 -------------- Released 2020-05-27 .. _new-features-57: New Features ~~~~~~~~~~~~ - (`#8822 `__) Add optional non-centrality parameter to ``hl.pchisqtail``. - (`#8861 `__) Add ``contig_recoding`` option to ``hl.experimental.run_combiner``. .. _bug-fixes-74: Bug fixes ~~~~~~~~~ - (`#8863 `__) Fixes VCF combiner to successfully import GVCFs with alleles called as . - (`#8845 `__) Fixed issue where accessing an element of an ndarray in a call to Table.transmute would fail. - (`#8855 `__) Fix crash in ``filter_intervals``. -------------- Version 0.2.41 -------------- Released 2020-05-15 .. _bug-fixes-75: Bug fixes ~~~~~~~~~ - (`#8799 `__)(`#8786 `__) Fix ArrayIndexOutOfBoundsException seen in pipelines that reuse a tuple value. .. _hailctl-dataproc-11: hailctl dataproc ~~~~~~~~~~~~~~~~ - (`#8790 `__) Use configured compute zone as default for ``hailctl dataproc connect`` and ``hailctl dataproc modify``. -------------- Version 0.2.40 -------------- Released 2020-05-12 .. _vcf-combiner-1: VCF Combiner ~~~~~~~~~~~~ - (`#8706 `__) Add option to key by both locus and alleles for final output. .. _bug-fixes-76: Bug fixes ~~~~~~~~~ - (`#8729 `__) Fix assertion error in ``Table.group_by(...).aggregate(...)`` - (`#8708 `__) Fix assertion error in reading tables and matrix tables with ``_intervals`` option. - (`#8756 `__) Fix return type of ``LocusExpression.window`` to use locus’s reference genome instead of default RG. -------------- Version 0.2.39 -------------- Released 2020-04-29 .. _bug-fixes-77: Bug fixes ~~~~~~~~~ - (`#8615 `__) Fix contig ordering in the CanFam3 (dog) reference genome. - (`#8622 `__) Fix bug that causes inscrutable JVM Bytecode errors. - (`#8645 `__) Ease unnecessarily strict assertion that caused errors when aggregating by key (e.g. ``hl.experimental.spread``). - (`#8621 `__) ``hl.nd.array`` now supports arrays with no elements (e.g. ``hl.nd.array([]).reshape((0, 5))``) and, consequently, matmul with an inner dimension of zero. .. _new-features-58: New features ~~~~~~~~~~~~ - (`#8571 `__) ``hl.init(skip_logging_configuration=True)`` will skip configuration of Log4j. Users may use this to configure their own logging. - (`#8588 `__) Users who manually build Python wheels will experience less unnecessary output when doing so. - (`#8572 `__) Add ``hl.parse_json`` which converts a string containing JSON into a Hail object. .. _performance-improvements-11: Performance Improvements ~~~~~~~~~~~~~~~~~~~~~~~~ - (`#8535 `__) Increase speed of ``import_vcf``. - (`#8618 `__) Increase speed of Jupyter Notebook file listing and Notebook creation when buckets contain many objects. - (`#8613 `__) ``hl.experimental.export_entries_by_col`` stages files for improved reliability and performance. .. _documentation-2: Documentation ~~~~~~~~~~~~~ - (`#8619 `__) Improve installation documentation to suggest better performing LAPACK and BLAS libraries. - (`#8647 `__) Clarify that a LAPACK or BLAS library is a *requirement* for a complete Hail installation. - (`#8654 `__) Add link to document describing the creation of a Microsoft Azure HDInsight Hail cluster. -------------- Version 0.2.38 -------------- Released 2020-04-21 Critical Linreg Aggregator Correctness Bug ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - (`#8575 `__) Fixed a correctness bug in the linear regression aggregator. This was introduced in version 0.2.29. See https://discuss.hail.is/t/possible-incorrect-linreg-aggregator-results-in-0-2-29-0-2-37/1375 for more details. .. _performance-improvements-12: Performance improvements ~~~~~~~~~~~~~~~~~~~~~~~~ - (`#8558 `__) Make ``hl.experimental.export_entries_by_col`` more fault tolerant. -------------- Version 0.2.37 -------------- Released 2020-04-14 .. _bug-fixes-78: Bug fixes ~~~~~~~~~ - (`#8487 `__) Fix incorrect handling of badly formatted data for ``hl.gp_dosage``. - (`#8497 `__) Fix handling of missingness for ``hl.hamming``. - (`#8537 `__) Fix compile-time errror. - (`#8539 `__) Fix compiler error in ``Table.multi_way_zip_join``. - (`#8488 `__) Fix ``hl.agg.call_stats`` to appropriately throw an error for badly-formatted calls. .. _new-features-59: New features ~~~~~~~~~~~~ - (`#8327 `__) Attempting to write to the same file being read from in a pipeline will now throw an error instead of corrupting data. -------------- Version 0.2.36 -------------- Released 2020-04-06 Critical Memory Management Bug Fix ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - (`#8463 `__) Reverted a change (separate to the bug in 0.2.34) that led to a memory leak in version 0.2.35. .. _bug-fixes-79: Bug fixes ~~~~~~~~~ - (`#8371 `__) Fix runtime error in joins leading to “Cannot set required field missing” error message. - (`#8436 `__) Fix compiler bug leading to possibly-invalid generated code. -------------- Version 0.2.35 -------------- Released 2020-04-02 .. _critical-memory-management-bug-fix-1: Critical Memory Management Bug Fix ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - (`#8412 `__) Fixed a serious per-partition memory leak that causes certain pipelines to run out of memory unexpectedly. Please update from 0.2.34. .. _new-features-60: New features ~~~~~~~~~~~~ - (`#8404 `__) Added “CanFam3” (a reference genome for dogs) as a bundled reference genome. .. _bug-fixes-80: Bug fixes ~~~~~~~~~ - (`#8420 `__) Fixed a bug where ``hl.binom_test``\ ’s ``"lower"`` and ``"upper"`` alternative options were reversed. - (`#8377 `__) Fixed “inconsistent agg or scan environments” error. - (`#8322 `__) Fixed bug where ``aggregate_rows`` did not interact with ``hl.agg.array_agg`` correctly. .. _performance-improvements-13: Performance Improvements ~~~~~~~~~~~~~~~~~~~~~~~~ - (`#8413 `__) Improves internal region memory management, decreasing JVM overhead. - (`#8383 `__) Significantly improve GVCF import speed. - (`#8358 `__) Fixed memory leak in ``hl.experimental.export_entries_by_col``. - (`#8326 `__) Codegen infrastructure improvement resulting in ~3% overall speedup. .. _hailctl-dataproc-12: hailctl dataproc ~~~~~~~~~~~~~~~~ - (`#8399 `__) Enable spark speculation by default. - (`#8340 `__) Add new Australia region to ``--vep``. - (`#8347 `__) Support all GCP machine types as potential master machines. -------------- Version 0.2.34 -------------- Released 2020-03-12 .. _new-features-61: New features ~~~~~~~~~~~~ - (`#8233 `__) ``StringExpression.matches`` can now take a hail ``StringExpression``, as opposed to only regular python strings. - (`#8198 `__) Improved matrix multiplication interoperation between hail ``NDArrayExpression`` and numpy. .. _bug-fixes-81: Bug fixes ~~~~~~~~~ - (`#8279 `__) Fix a bug where ``hl.agg.approx_cdf`` failed inside of a ``group_cols_by``. - (`#8275 `__) Fix bad error message coming from ``mt.make_table()`` when keys are missing. - (`#8274 `__) Fix memory leak in ``hl.export_bgen``. - (`#8273 `__) Fix segfault caused by ``hl.agg.downsample`` inside of an ``array_agg`` or ``group_by``. .. _hailctl-dataproc-13: hailctl dataproc ~~~~~~~~~~~~~~~~ - (`#8253 `__) ``hailctl dataproc`` now supports new flags ``--requester-pays-allow-all`` and ``--requester-pays-allow-buckets``. This will configure your hail installation to be able to read from requester pays buckets. The charges for reading from these buckets will be billed to the project that the cluster is created in. - (`#8268 `__) The data sources for VEP have been moved to ``gs://hail-us-vep``, ``gs://hail-eu-vep``, and ``gs://hail-uk-vep``, which are requester-pays buckets in Google Cloud. ``hailctl dataproc`` will automatically infer which of these buckets you should pull data from based on the region your cluster is spun up in. If you are in none of those regions, please contact us on discuss.hail.is. .. _file-format-3: File Format ~~~~~~~~~~~ - The native file format version is now 1.4.0. Older versions of Hail will not be able to read tables or matrix tables written by this version of Hail. -------------- Version 0.2.33 -------------- Released 2020-02-27 .. _new-features-62: New features ~~~~~~~~~~~~ - (`#8173 `__) Added new method ``hl.zeros``. .. _bug-fixes-82: Bug fixes ~~~~~~~~~ - (`#8153 `__) Fixed complier bug causing ``MatchError`` in ``import_bgen``. - (`#8123 `__) Fixed an issue with multiple Python HailContexts running on the same cluster. - (`#8150 `__) Fixed an issue where output from VEP about failures was not reported in error message. - (`#8152 `__) Fixed an issue where the row count of a MatrixTable coming from ``import_matrix_table`` was incorrect. - (`#8175 `__) Fixed a bug where ``persist`` did not actually do anything. .. _hailctl-dataproc-14: ``hailctl dataproc`` ~~~~~~~~~~~~~~~~~~~~ - (`#8079 `__) Using ``connect`` to open the jupyter notebook browser will no longer crash if your project contains requester-pays buckets. -------------- Version 0.2.32 -------------- Released 2020-02-07 Critical performance regression fix ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - (`#7989 `__) Fixed performance regression leading to a large slowdown when ``hl.variant_qc`` was run after filtering columns. .. _performance-3: Performance ~~~~~~~~~~~ - (`#7962 `__) Improved performance of ``hl.pc_relate``. - (`#8032 `__) Drastically improve performance of pipelines calling ``hl.variant_qc`` and ``hl.sample_qc`` iteratively. - (`#8037 `__) Improve performance of NDArray matrix multiply by using native linear algebra libraries. .. _bug-fixes-83: Bug fixes ~~~~~~~~~ - (`#7976 `__) Fixed divide-by-zero error in ``hl.concordance`` with no overlapping rows or cols. - (`#7965 `__) Fixed optimizer error leading to crashes caused by ``MatrixTable.union_rows``. - (`#8035 `__) Fix compiler bug in ``Table.multi_way_zip_join``. - (`#8021 `__) Fix bug in computing shape after ``BlockMatrix.filter``. - (`#7986 `__) Fix error in NDArray matrix/vector multiply. .. _new-features-63: New features ~~~~~~~~~~~~ - (`#8007 `__) Add ``hl.nd.diagonal`` function. Cheat sheets ~~~~~~~~~~~~ - (`#7940 `__) Added cheat sheet for MatrixTables. - (`#7963 `__) Improved Table sheet sheet. -------------- Version 0.2.31 -------------- Released 2020-01-22 .. _new-features-64: New features ~~~~~~~~~~~~ - (`#7787 `__) Added transition/transversion information to ``hl.summarize_variants``. - (`#7792 `__) Add Python stack trace to array index out of bounds errors in Hail pipelines. - (`#7832 `__) Add ``spark_conf`` argument to ``hl.init``, permitting configuration of Spark runtime for a Hail session. - (`#7823 `__) Added datetime functions ``hl.experimental.strptime`` and ``hl.experimental.strftime``. - (`#7888 `__) Added ``hl.nd.array`` constructor from nested standard arrays. File size ~~~~~~~~~ - (`#7923 `__) Fixed compression problem since 0.2.23 resulting in larger-than-expected matrix table files for datasets with few entry fields (e.g. GT-only datasets). .. _performance-4: Performance ~~~~~~~~~~~ - (`#7867 `__) Fix performance regression leading to extra scans of data when ``order_by`` and ``key_by`` appeared close together. - (`#7901 `__) Fix performance regression leading to extra scans of data when ``group_by/aggregate`` and ``key_by`` appeared close together. - (`#7830 `__) Improve performance of array arithmetic. .. _bug-fixes-84: Bug fixes ~~~~~~~~~ - (`#7922 `__) Fix still-not-well-understood serialization error about ApproxCDFCombiner. - (`#7906 `__) Fix optimizer error by relaxing unnecessary assertion. - (`#7788 `__) Fix possible memory leak in ``ht.tail`` and ``ht.head``. - (`#7796 `__) Fix bug in ingesting numpy arrays not in row-major orientation. -------------- Version 0.2.30 -------------- Released 2019-12-20 .. _performance-5: Performance ~~~~~~~~~~~ - (`#7771 `__) Fixed extreme performance regression in scans. - (`#7764 `__) Fixed ``mt.entry_field.take`` performance regression. .. _new-features-65: New features ~~~~~~~~~~~~ - (`#7614 `__) Added experimental support for loops with ``hl.experimental.loop``. Miscellaneous ~~~~~~~~~~~~~ - (`#7745 `__) Changed ``export_vcf`` to only use scientific notation when necessary. -------------- Version 0.2.29 -------------- Released 2019-12-17 .. _bug-fixes-85: Bug fixes ~~~~~~~~~ - (`#7229 `__) Fixed ``hl.maximal_independent_set`` tie breaker functionality. - (`#7732 `__) Fixed incompatibility with old files leading to incorrect data read when filtering intervals after ``read_matrix_table``. - (`#7642 `__) Fixed crash when constant-folding functions that throw errors. - (`#7611 `__) Fixed ``hl.hadoop_ls`` to handle glob patterns correctly. - (`#7653 `__) Fixed crash in ``ld_prune`` by unfiltering missing GTs. .. _performance-improvements-14: Performance improvements ~~~~~~~~~~~~~~~~~~~~~~~~ - (`#7719 `__) Generate more efficient IR for ``Table.flatten``. - (`#7740 `__) Method wrapping large let bindings to keep method size down. .. _new-features-66: New features ~~~~~~~~~~~~ - (`#7686 `__) Added ``comment`` argument to ``import_matrix_table``, allowing lines with certain prefixes to be ignored. - (`#7688 `__) Added experimental support for ``NDArrayExpression``\ s in new ``hl.nd`` module. - (`#7608 `__) ``hl.grep`` now has a ``show`` argument that allows users to either print the results (default) or return a dictionary of the results. .. _hailctl-dataproc-15: ``hailctl dataproc`` ~~~~~~~~~~~~~~~~~~~~ - (`#7717 `__) Throw error when mispelling arguments instead of silently quitting. -------------- Version 0.2.28 -------------- Released 2019-11-22 Critical correctness bug fix ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - (`#7588 `__) Fixes a bug where filtering old matrix tables in newer versions of hail did not work as expected. Please update from 0.2.27. .. _bug-fixes-86: Bug fixes ~~~~~~~~~ - (`#7571 `__) Don’t set GQ to missing if PL is missing in ``split_multi_hts``. - (`#7577 `__) Fixed an optimizer bug. .. _new-features-67: New Features ~~~~~~~~~~~~ - (`#7561 `__) Added ``hl.plot.visualize_missingness()`` to plot missingness patterns for MatrixTables. - (`#7575 `__) Added ``hl.version()`` to quickly check hail version. .. _hailctl-dataproc-16: ``hailctl dataproc`` ~~~~~~~~~~~~~~~~~~~~ - (`#7586 `__) ``hailctl dataproc`` now supports ``--gcloud_configuration`` option. .. _documentation-3: Documentation ~~~~~~~~~~~~~ - (`#7570 `__) Hail has a cheatsheet for Tables now. -------------- Version 0.2.27 -------------- Released 2019-11-15 .. _new-features-68: New Features ~~~~~~~~~~~~ - (`#7379 `__) Add ``delimiter`` argument to ``hl.import_matrix_table`` - (`#7389 `__) Add ``force`` and ``force_bgz`` arguments to ``hl.experimental.import_gtf`` - (`#7386 `__)(`#7394 `__) Add ``{Table, MatrixTable}.tail``. - (`#7467 `__) Added ``hl.if_else`` as an alias for ``hl.cond``; deprecated ``hl.cond``. - (`#7453 `__) Add ``hl.parse_int{32, 64}`` and ``hl.parse_float{32, 64}``, which can parse strings to numbers and return missing on failure. - (`#7475 `__) Add ``row_join_type`` argument to ``MatrixTable.union_cols`` to support outer joins on rows. .. _bug-fixes-87: Bug fixes ~~~~~~~~~ - (`#7479 `__)(`#7368 `__)(`#7402 `__) Fix optimizer bugs. - (`#7506 `__) Updated to latest htsjdk to resolve VCF parsing problems. .. _hailctl-dataproc-17: ``hailctl dataproc`` ~~~~~~~~~~~~~~~~~~~~ - (`#7460 `__) The Spark monitor widget now automatically collapses after a job completes. -------------- Version 0.2.26 -------------- Released 2019-10-24 .. _new-features-69: New Features ~~~~~~~~~~~~ - (`#7325 `__) Add ``string.reverse`` function. - (`#7328 `__) Add ``string.translate`` function. - (`#7344 `__) Add ``hl.reverse_complement`` function. - (`#7306 `__) Teach the VCF combiner to handle allele specific (``AS_*``) fields. - (`#7346 `__) Add ``hl.agg.approx_median`` function. .. _bug-fixes-88: Bug Fixes ~~~~~~~~~ - (`#7361 `__) Fix ``AD`` calculation in ``sparse_split_multi``. .. _performance-improvements-15: Performance Improvements ~~~~~~~~~~~~~~~~~~~~~~~~ - (`#7355 `__) Improve performance of IR copying. .. _file-format-4: File Format ~~~~~~~~~~~ - The native file format version is now 1.3.0. Older versions of Hail will not be able to read tables or matrix tables written by this version of Hail. Version 0.2.25 -------------- Released 2019-10-14 .. _new-features-70: New features ~~~~~~~~~~~~ - (`#7240 `__) Add interactive schema widget to ``{MatrixTable, Table}.describe``. Use this by passing the argument ``widget=True``. - (`#7250 `__) ``{Table, MatrixTable, Expression}.summarize()`` now summarizes elements of collections (arrays, sets, dicts). - (`#7271 `__) Improve ``hl.plot.qq`` by increasing point size, adding the unscaled p-value to hover data, and printing lambda-GC on the plot. - (`#7280 `__) Add HTML output for ``{Table, MatrixTable, Expression}.summarize()``. - (`#7294 `__) Add HTML output for ``hl.summarize_variants()``. .. _bug-fixes-89: Bug fixes ~~~~~~~~~ - (`#7200 `__) Fix VCF parsing with missingness inside arrays of floating-point values in the FORMAT field. - (`#7219 `__) Fix crash due to invalid optimizer rule. .. _performance-improvements-16: Performance improvements ~~~~~~~~~~~~~~~~~~~~~~~~ - (`#7187 `__) Dramatically improve performance of chained ``BlockMatrix`` multiplies without checkpoints in between. - (`#7195 `__)(`#7194 `__) Improve performance of ``group[_rows]_by`` / ``aggregate``. - (`#7201 `__) Permit code generation of larger aggregation pipelines. .. _file-format-5: File Format ~~~~~~~~~~~ - The native file format version is now 1.2.0. Older versions of Hail will not be able to read tables or matrix tables written by this version of Hail. -------------- Version 0.2.24 -------------- Released 2019-10-03 .. _hailctl-dataproc-18: ``hailctl dataproc`` ~~~~~~~~~~~~~~~~~~~~ - (`#7185 `__) Resolve issue in dependencies that led to a Jupyter update breaking cluster creation. .. _new-features-71: New features ~~~~~~~~~~~~ - (`#7071 `__) Add ``permit_shuffle`` flag to ``hl.{split_multi, split_multi_hts}`` to allow processing of datasets with both multiallelics and duplciate loci. - (`#7121 `__) Add ``hl.contig_length`` function. - (`#7130 `__) Add ``window`` method on ``LocusExpression``, which creates an interval around a locus. - (`#7172 `__) Permit ``hl.init(sc=sc)`` with pip-installed packages, given the right configuration options. .. _bug-fixes-90: Bug fixes ~~~~~~~~~ - (`#7070 `__) Fix unintentionally strict type error in ``MatrixTable.union_rows``. - (`#7170 `__) Fix issues created downstream of ``BlockMatrix.T``. - (`#7146 `__) Fix bad handling of edge cases in ``BlockMatrix.filter``. - (`#7182 `__) Fix problem parsing VCFs where lines end in an INFO field of type flag. -------------- Version 0.2.23 -------------- Released 2019-09-23 .. _hailctl-dataproc-19: ``hailctl dataproc`` ~~~~~~~~~~~~~~~~~~~~ - (`#7087 `__) Added back progress bar to notebooks, with links to the correct Spark UI url. - (`#7104 `__) Increased disk requested when using ``--vep`` to address the “colony collapse” cluster error mode. .. _bug-fixes-91: Bug fixes ~~~~~~~~~ - (`#7066 `__) Fixed generated code when methods from multiple reference genomes appear together. - (`#7077 `__) Fixed crash in ``hl.agg.group_by``. .. _new-features-72: New features ~~~~~~~~~~~~ - (`#7009 `__) Introduced analysis pass in Python that mostly obviates the ``hl.bind`` and ``hl.rbind`` operators; idiomatic Python that generates Hail expressions will perform much better. - (`#7076 `__) Improved memory management in generated code, add additional log statements about allocated memory to improve debugging. - (`#7085 `__) Warn only once about schema mismatches during JSON import (used in VEP, Nirvana, and sometimes ``import_table``. - (`#7106 `__) ``hl.agg.call_stats`` can now accept a number of alleles for its ``alleles`` parameter, useful when dealing with biallelic calls without the alleles array at hand. .. _performance-6: Performance ~~~~~~~~~~~ - (`#7086 `__) Improved performance of JSON import. - (`#6981 `__) Improved performance of Hail min/max/mean operators. Improved performance of ``split_multi_hts`` by an additional 33%. - (`#7082 `__)(`#7096 `__)(`#7098 `__) Improved performance of large pipelines involving many ``annotate`` calls. -------------- Version 0.2.22 -------------- Released 2019-09-12 .. _new-features-73: New features ~~~~~~~~~~~~ - (`#7013 `__) Added ``contig_recoding`` to ``import_bed`` and ``import_locus_intervals``. .. _performance-7: Performance ~~~~~~~~~~~ - (`#6969 `__) Improved performance of ``hl.agg.mean``, ``hl.agg.stats``, and ``hl.agg.corr``. - (`#6987 `__) Improved performance of ``import_matrix_table``. - (`#7033 `__)(`#7049 `__) Various improvements leading to overall 10-15% improvement. .. _hailctl-dataproc-20: ``hailctl dataproc`` ~~~~~~~~~~~~~~~~~~~~ - (`#7003 `__) Pass through extra arguments for ``hailctl dataproc list`` and ``hailctl dataproc stop``. -------------- Version 0.2.21 -------------- Released 2019-09-03 .. _bug-fixes-92: Bug fixes ~~~~~~~~~ - (`#6945 `__) Fixed ``expand_types`` to preserve ordering by key, also affects ``to_pandas`` and ``to_spark``. - (`#6958 `__) Fixed stack overflow errors when counting the result of a ``Table.union``. .. _new-features-74: New features ~~~~~~~~~~~~ - (`#6856 `__) Teach ``hl.agg.counter`` to weigh each value differently. - (`#6903 `__) Teach ``hl.range`` to treat a single argument as ``0..N``. - (`#6903 `__) Teach ``BlockMatrix`` how to ``checkpoint``. .. _performance-8: Performance ~~~~~~~~~~~ - (`#6895 `__) Improved performance of ``hl.import_bgen(...).count()``. - (`#6948 `__) Fixed performance bug in ``BlockMatrix`` filtering functions. - (`#6943 `__) Improved scaling of ``Table.union``. - (`#6980 `__) Reduced compute time for ``split_multi_hts`` by as much as 40%. .. _hailctl-dataproc-21: ``hailctl dataproc`` ~~~~~~~~~~~~~~~~~~~~ - (`#6904 `__) Added ``--dry-run`` option to ``submit``. - (`#6951 `__) Fixed ``--max-idle`` and ``--max-age`` arguments to ``start``. - (`#6919 `__) Added ``--update-hail-version`` to ``modify``. -------------- Version 0.2.20 -------------- Released 2019-08-19 Critical memory management fix ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - (`#6824 `__) Fixed memory management inside ``annotate_cols`` with aggregations. This was causing memory leaks and segfaults. .. _bug-fixes-93: Bug fixes ~~~~~~~~~ - (`#6769 `__) Fixed non-functional ``hl.lambda_gc`` method. - (`#6847 `__) Fixed bug in handling of NaN in ``hl.agg.min`` and ``hl.agg.max``. These will now properly ignore NaN (the intended semantics). Note that ``hl.min`` and ``hl.max`` propagate NaN; use ``hl.nanmin`` and ``hl.nanmax`` to ignore NaN. .. _new-features-75: New features ~~~~~~~~~~~~ - (`#6847 `__) Added ``hl.nanmin`` and ``hl.nanmax`` functions. -------------- Version 0.2.19 -------------- Released 2019-08-01 Critical performance bug fix ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - (`#6629 `__) Fixed a critical performance bug introduced in (`#6266 `__). This bug led to long hang times when reading in Hail tables and matrix tables **written in version 0.2.18**. .. _bug-fixes-94: Bug fixes ~~~~~~~~~ - (`#6757 `__) Fixed correctness bug in optimizations applied to the combination of ``Table.order_by`` with ``hl.desc`` arguments and ``show()``, leading to tables sorted in ascending, not descending order. - (`#6770 `__) Fixed assertion error caused by ``Table.expand_types()``, which was used by ``Table.to_spark`` and ``Table.to_pandas``. .. _performance-improvements-17: Performance Improvements ~~~~~~~~~~~~~~~~~~~~~~~~ - (`#6666 `__) Slightly improve performance of ``hl.pca`` and ``hl.hwe_normalized_pca``. - (`#6669 `__) Improve performance of ``hl.split_multi`` and ``hl.split_multi_hts``. - (`#6644 `__) Optimize core code generation primitives, leading to across-the-board performance improvements. - (`#6775 `__) Fixed a major performance problem related to reading block matrices. .. _hailctl-dataproc-22: ``hailctl dataproc`` ~~~~~~~~~~~~~~~~~~~~ - (`#6760 `__) Fixed the address pointed at by ``ui`` in ``connect``, after Google changed proxy settings that rendered the UI URL incorrect. Also added new address ``hist/spark-history``. -------------- Version 0.2.18 -------------- Released 2019-07-12 .. _critical-performance-bug-fix-1: Critical performance bug fix ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - (`#6605 `__) Resolved code generation issue leading a performance regression of 1-3 orders of magnitude in Hail pipelines using constant strings or literals. This includes almost every pipeline! **This issue has exists in versions 0.2.15, 0.2.16, and 0.2.17, and any users on those versions should update as soon as possible.** .. _bug-fixes-95: Bug fixes ~~~~~~~~~ - (`#6598 `__) Fixed code generated by ``MatrixTable.unfilter_entries`` to improve performance. This will slightly improve the performance of ``hwe_normalized_pca`` and relatedness computation methods, which use ``unfilter_entries`` internally. -------------- Version 0.2.17 -------------- Released 2019-07-10 .. _new-features-76: New features ~~~~~~~~~~~~ - (`#6349 `__) Added ``compression`` parameter to ``export_block_matrices``, which can be ``'gz'`` or ``'bgz'``. - (`#6405 `__) When a matrix table has string column-keys, ``matrixtable.show`` uses the column key as the column name. - (`#6345 `__) Added an improved scan implementation, which reduces the memory load on master. - (`#6462 `__) Added ``export_bgen`` method. - (`#6473 `__) Improved performance of ``hl.agg.array_sum`` by about 50%. - (`#6498 `__) Added method ``hl.lambda_gc`` to calculate the genomic control inflation factor. - (`#6456 `__) Dramatically improved performance of pipelines containing long chains of calls to ``Table.annotate``, or ``MatrixTable`` equivalents. - (`#6506 `__) Improved the performance of the generated code for the ``Table.annotate(**thing)`` pattern. .. _bug-fixes-96: Bug fixes ~~~~~~~~~ - (`#6404 `__) Added ``n_rows`` and ``n_cols`` parameters to ``Expression.show`` for consistency with other ``show`` methods. - (`#6408 `__)(`#6419 `__) Fixed an issue where the ``filter_intervals`` optimization could make scans return incorrect results. - (`#6459 `__)(`#6458 `__) Fixed rare correctness bug in the ``filter_intervals`` optimization which could result too many rows being kept. - (`#6496 `__) Fixed html output of ``show`` methods to truncate long field contents. - (`#6478 `__) Fixed the broken documentation for the experimental ``approx_cdf`` and ``approx_quantiles`` aggregators. - (`#6504 `__) Fix ``Table.show`` collecting data twice while running in Jupyter notebooks. - (`#6571 `__) Fixed the message printed in ``hl.concordance`` to print the number of overlapping samples, not the full list of overlapping sample IDs. - (`#6583 `__) Fixed ``hl.plot.manhattan`` for non-default reference genomes. Experimental ~~~~~~~~~~~~ - (`#6488 `__) Exposed ``table.multi_way_zip_join``. This takes a list of tables of identical types, and zips them together into one table. .. _file-format-6: File Format ~~~~~~~~~~~ - The native file format version is now 1.1.0. Older versions of Hail will not be able to read tables or matrix tables written by this version of Hail. -------------- Version 0.2.16 -------------- Released 2019-06-19 ``hailctl`` ~~~~~~~~~~~ - (`#6357 `__) Accommodated Google Dataproc bug causing cluster creation failures. .. _bug-fixes-97: Bug fixes ~~~~~~~~~ - (`#6378 `__) Fixed problem in how ``entry_float_type`` was being handled in ``import_vcf``. -------------- Version 0.2.15 -------------- Released 2019-06-14 After some infrastructural changes to our development process, we should be getting back to frequent releases. .. _hailctl-1: ``hailctl`` ~~~~~~~~~~~ Starting in 0.2.15, ``pip`` installations of Hail come bundled with a command- line tool, ``hailctl``. This tool subsumes the functionality of ``cloudtools``, which is now deprecated. See the `release thread on the forum `__ for more information. .. _new-features-77: New features ~~~~~~~~~~~~ - (`#5932 `__)(`#6115 `__) ``hl.import_bed`` abd ``hl.import_locus_intervals`` now accept keyword arguments to pass through to ``hl.import_table``, which is used internally. This permits parameters like ``min_partitions`` to be set. - (`#5980 `__) Added ``log`` option to ``hl.plot.histogram2d``. - (`#5937 `__) Added ``all_matches`` parameter to ``Table.index`` and ``MatrixTable.index_{rows, cols, entries}``, which produces an array of all rows in the indexed object matching the index key. This makes it possible to, for example, annotate all intervals overlapping a locus. - (`#5913 `__) Added functionality that makes arrays of structs easier to work with. - (`#6089 `__) Added HTML output to ``Expression.show`` when running in a notebook. - (`#6172 `__) ``hl.split_multi_hts`` now uses the original ``GQ`` value if the ``PL`` is missing. - (`#6123 `__) Added ``hl.binary_search`` to search sorted numeric arrays. - (`#6224 `__) Moved implementation of ``hl.concordance`` from backend to Python. Performance directly from ``read()`` is slightly worse, but inside larger pipelines this function will be optimized much better than before, and it will benefit improvements to general infrastructure. - (`#6214 `__) Updated Hail Python dependencies. - (`#5979 `__) Added optimizer pass to rewrite filter expressions on keys as interval filters where possible, leading to massive speedups for point queries. See the `blog post `__ for examples. .. _bug-fixes-98: Bug fixes ~~~~~~~~~ - (`#5895 `__) Fixed crash caused by ``-0.0`` floating-point values in ``hl.agg.hist``. - (`#6013 `__) Turned off feature in HTSJDK that caused crashes in ``hl.import_vcf`` due to header fields being overwritten with different types, if the field had a different type than the type in the VCF 4.2 spec. - (`#6117 `__) Fixed problem causing ``Table.flatten()`` to be quadratic in the size of the schema. - (`#6228 `__)(`#5993 `__) Fixed ``MatrixTable.union_rows()`` to join distinct keys on the right, preventing an unintentional cartesian product. - (`#6235 `__) Fixed an issue related to aggregation inside ``MatrixTable.filter_cols``. - (`#6226 `__) Restored lost behavior where ``Table.show(x < 0)`` shows the entire table. - (`#6267 `__) Fixed cryptic crashes related to ``hl.split_multi`` and ``MatrixTable.entries()`` with duplicate row keys. -------------- Version 0.2.14 -------------- Released 2019-04-24 A back-incompatible patch update to PySpark, 2.4.2, has broken fresh pip installs of Hail 0.2.13. To fix this, either *downgrade* PySpark to 2.4.1 or upgrade to the latest version of Hail. .. _new-features-78: New features ~~~~~~~~~~~~ - (`#5915 `__) Added ``hl.cite_hail`` and ``hl.cite_hail_bibtex`` functions to generate appropriate citations. - (`#5872 `__) Fixed ``hl.init`` when the ``idempotent`` parameter is ``True``. -------------- Version 0.2.13 -------------- Released 2019-04-18 Hail is now using Spark 2.4.x by default. If you build hail from source, you will need to acquire this version of Spark and update your build invocations accordingly. .. _new-features-79: New features ~~~~~~~~~~~~ - (`#5828 `__) Remove dependency on htsjdk for VCF INFO parsing, enabling faster import of some VCFs. - (`#5860 `__) Improve performance of some column annotation pipelines. - (`#5858 `__) Add ``unify`` option to ``Table.union`` which allows unification of tables with different fields or field orderings. - (`#5799 `__) ``mt.entries()`` is four times faster. - (`#5756 `__) Hail now uses Spark 2.4.x by default. - (`#5677 `__) ``MatrixTable`` now also supports ``show``. - (`#5793 `__)(`#5701 `__) Add ``array.index(x)`` which find the first index of ``array`` whose value is equal to ``x``. - (`#5790 `__) Add ``array.head()`` which returns the first element of the array, or missing if the array is empty. - (`#5690 `__) Improve performance of ``ld_matrix``. - (`#5743 `__) ``mt.compute_entry_filter_stats`` computes statistics about the number of filtered entries in a matrix table. - (`#5758 `__) failure to parse an interval will now produce a much more detailed error message. - (`#5723 `__) ``hl.import_matrix_table`` can now import a matrix table with no columns. - (`#5724 `__) ``hl.rand_norm2d`` samples from a two dimensional random normal. .. _bug-fixes-99: Bug fixes ~~~~~~~~~ - (`#5885 `__) Fix ``Table.to_spark`` in the presence of fields of tuples. - (`#5882 `__)(`#5886 `__) Fix ``BlockMatrix`` conversion methods to correctly handle filtered entries. - (`#5884 `__)(`#4874 `__) Fix longstanding crash when reading Hail data files under certain conditions. - (`#5855 `__)(`#5786 `__) Fix ``hl.mendel_errors`` incorrectly reporting children counts in the presence of entry filtering. - (`#5830 `__)(`#5835 `__) Fix Nirvana support - (`#5773 `__) Fix ``hl.sample_qc`` to use correct number of total rows when calculating call rate. - (`#5763 `__)(`#5764 `__) Fix ``hl.agg.array_agg`` to work inside ``mt.annotate_rows`` and similar functions. - (`#5770 `__) Hail now uses the correct unicode string encoding which resolves a number of issues when a Table or MatrixTable has a key field containing unicode characters. - (`#5692 `__) When ``keyed`` is ``True``, ``hl.maximal_independent_set`` now does not produce duplicates. - (`#5725 `__) Docs now consistently refer to ``hl.agg`` not ``agg``. - (`#5730 `__)(`#5782 `__) Taught ``import_bgen`` to optimize its ``variants`` argument. .. _experimental-1: Experimental ~~~~~~~~~~~~ - (`#5732 `__) The ``hl.agg.approx_quantiles`` aggregate computes an approximation of the quantiles of an expression. - (`#5693 `__)(`#5396 `__) ``Table._multi_way_zip_join`` now correctly handles keys that have been truncated. -------------- Version 0.2.12 -------------- Released 2019-03-28 .. _new-features-80: New features ~~~~~~~~~~~~ - (`#5614 `__) Add support for multiple missing values in ``hl.import_table``. - (`#5666 `__) Produce HTML table output for ``Table.show()`` when running in Jupyter notebook. .. _bug-fixes-100: Bug fixes ~~~~~~~~~ - (`#5603 `__)(`#5697 `__) Fixed issue where ``min_partitions`` on ``hl.import_table`` was non-functional. - (`#5611 `__) Fix ``hl.nirvana`` crash. .. _experimental-2: Experimental ~~~~~~~~~~~~ - (`#5524 `__) Add ``summarize`` functions to Table, MatrixTable, and Expression. - (`#5570 `__) Add ``hl.agg.approx_cdf`` aggregator for approximate density calculation. - (`#5571 `__) Add ``log`` parameter to ``hl.plot.histogram``. - (`#5601 `__) Add ``hl.plot.joint_plot``, extend functionality of ``hl.plot.scatter``. - (`#5608 `__) Add LD score simulation framework. - (`#5628 `__) Add ``hl.experimental.full_outer_join_mt`` for full outer joins on ``MatrixTable``\ s. -------------- Version 0.2.11 -------------- Released 2019-03-06 .. _new-features-81: New features ~~~~~~~~~~~~ - (`#5374 `__) Add default arguments to ``hl.add_sequence`` for running on GCP. - (`#5481 `__) Added ``sample_cols`` method to ``MatrixTable``. - (`#5501 `__) Exposed ``MatrixTable.unfilter_entries``. See ``filter_entries`` documentation for more information. - (`#5480 `__) Added ``n_cols`` argument to ``MatrixTable.head``. - (`#5529 `__) Added ``Table.{semi_join, anti_join}`` and ``MatrixTable.{semi_join_rows, semi_join_cols, anti_join_rows, anti_join_cols}``. - (`#5528 `__) Added ``{MatrixTable, Table}.checkpoint`` methods as wrappers around ``write`` / ``read_{matrix_table, table}``. .. _bug-fixes-101: Bug fixes ~~~~~~~~~ - (`#5416 `__) Resolved issue wherein VEP and certain regressions were recomputed on each use, rather than once. - (`#5419 `__) Resolved issue with ``import_vcf`` ``force_bgz`` and file size checks. - (`#5427 `__) Resolved issue with ``Table.show`` and dictionary field types. - (`#5468 `__) Resolved ordering problem with ``Expression.show`` on key fields that are not the first key. - (`#5492 `__) Fixed ``hl.agg.collect`` crashing when collecting ``float32`` values. - (`#5525 `__) Fixed ``hl.trio_matrix`` crashing when ``complete_trios`` is ``False``. -------------- Version 0.2.10 -------------- Released 2019-02-15 .. _new-features-82: New features ~~~~~~~~~~~~ - (`#5272 `__) Added a new ‘delimiter’ option to Table.export. - (`#5251 `__) Add utility aliases to ``hl.plot`` for ``output_notebook`` and ``show``. - (`#5249 `__) Add ``histogram2d`` function to ``hl.plot`` module. - (`#5247 `__) Expose ``MatrixTable.localize_entries`` method for converting to a Table with an entries array. - (`#5300 `__) Add new ``filter`` and ``find_replace`` arguments to ``hl.import_table`` and ``hl.import_vcf`` to apply regex and substitutions to text input. .. _performance-improvements-18: Performance improvements ~~~~~~~~~~~~~~~~~~~~~~~~ - (`#5298 `__) Reduce size of exported VCF files by exporting missing genotypes without trailing fields. .. _bug-fixes-102: Bug fixes ~~~~~~~~~ - (`#5306 `__) Fix ``ReferenceGenome.add_sequence`` causing a crash. - (`#5268 `__) Fix ``Table.export`` writing a file called ‘None’ in the current directory. - (`#5265 `__) Fix ``hl.get_reference`` raising an exception when called before ``hl.init()``. - (`#5250 `__) Fix crash in ``pc_relate`` when called on a MatrixTable field other than ‘GT’. - (`#5278 `__) Fix crash in ``Table.order_by`` when sorting by fields whose names are not valid Python identifiers. - (`#5294 `__) Fix crash in ``hl.trio_matrix`` when sample IDs are missing. - (`#5295 `__) Fix crash in ``Table.index`` related to key field incompatibilities. -------------- Version 0.2.9 ------------- Released 2019-01-30 .. _new-features-83: New features ~~~~~~~~~~~~ - (`#5149 `__) Added bitwise transformation functions: ``hl.bit_{and, or, xor, not, lshift, rshift}``. - (`#5154 `__) Added ``hl.rbind`` function, which is similar to ``hl.bind`` but expects a function as the last argument instead of the first. .. _performance-improvements-19: Performance improvements ~~~~~~~~~~~~~~~~~~~~~~~~ - (`#5107 `__) Hail’s Python interface generates tighter intermediate code, which should result in moderate performance improvements in many pipelines. - (`#5172 `__) Fix unintentional performance deoptimization related to ``Table.show`` introduced in 0.2.8. - (`#5078 `__) Improve performance of ``hl.ld_prune`` by up to 30x. .. _bug-fixes-103: Bug fixes ~~~~~~~~~ - (`#5144 `__) Fix crash caused by ``hl.index_bgen`` (since 0.2.7) - (`#5177 `__) Fix bug causing ``Table.repartition(n, shuffle=True)`` to fail to increase partitioning for unkeyed tables. - (`#5173 `__) Fix bug causing ``Table.show`` to throw an error when the table is empty (since 0.2.8). - (`#5210 `__) Fix bug causing ``Table.show`` to always print types, regardless of ``types`` argument (since 0.2.8). - (`#5211 `__) Fix bug causing ``MatrixTable.make_table`` to unintentionally discard non-key row fields (since 0.2.8). -------------- Version 0.2.8 ------------- Released 2019-01-15 .. _new-features-84: New features ~~~~~~~~~~~~ - (`#5072 `__) Added multi-phenotype option to ``hl.logistic_regression_rows`` - (`#5077 `__) Added support for importing VCF floating-point FORMAT fields as ``float32`` as well as ``float64``. .. _performance-improvements-20: Performance improvements ~~~~~~~~~~~~~~~~~~~~~~~~ - (`#5068 `__) Improved optimization of ``MatrixTable.count_cols``. - (`#5131 `__) Fixed performance bug related to ``hl.literal`` on large values with missingness .. _bug-fixes-104: Bug fixes ~~~~~~~~~ - (`#5088 `__) Fixed name separator in ``MatrixTable.make_table``. - (`#5104 `__) Fixed optimizer bug related to experimental functionality. - (`#5122 `__) Fixed error constructing ``Table`` or ``MatrixTable`` objects with fields with certain character patterns like ``$``. -------------- Version 0.2.7 ------------- Released 2019-01-03 .. _new-features-85: New features ~~~~~~~~~~~~ - (`#5046 `__)(experimental) Added option to BlockMatrix.export_rectangles to export as NumPy-compatible binary. .. _performance-improvements-21: Performance improvements ~~~~~~~~~~~~~~~~~~~~~~~~ - (`#5050 `__) Short-circuit iteration in ``logistic_regression_rows`` and ``poisson_regression_rows`` if NaNs appear. -------------- Version 0.2.6 ------------- Released 2018-12-17 .. _new-features-86: New features ~~~~~~~~~~~~ - (`#4962 `__) Expanded comparison operators (``==``, ``!=``, ``<``, ``<=``, ``>``, ``>=``) to support expressions of every type. - (`#4927 `__) Expanded functionality of ``Table.order_by`` to support ordering by arbitrary expressions, instead of just top-level fields. - (`#4926 `__) Expanded default GRCh38 contig recoding behavior in ``import_plink``. .. _performance-improvements-22: Performance improvements ~~~~~~~~~~~~~~~~~~~~~~~~ - (`#4952 `__) Resolved lingering issues related to (`#4909 `__). .. _bug-fixes-105: Bug fixes ~~~~~~~~~ - (`#4941 `__) Fixed variable scoping error in regression methods. - (`#4857 `__) Fixed bug in maximal_independent_set appearing when nodes were named something other than ``i`` and ``j``. - (`#4932 `__) Fixed possible error in ``export_plink`` related to tolerance of writer process failure. - (`#4920 `__) Fixed bad error message in ``Table.order_by``. -------------- Version 0.2.5 ------------- Released 2018-12-07 .. _new-features-87: New features ~~~~~~~~~~~~ - (`#4845 `__) The `or_error `__ method in ``hl.case`` and ``hl.switch`` statements now takes a string expression rather than a string literal, allowing more informative messages for errors and assertions. - (`#4865 `__) We use this new ``or_error`` functionality in methods that require biallelic variants to include an offending variant in the error message. - (`#4820 `__) Added `hl.reversed `__ for reversing arrays and strings. - (`#4895 `__) Added ``include_strand`` option to the `hl.liftover `__ function. .. _performance-improvements-23: Performance improvements ~~~~~~~~~~~~~~~~~~~~~~~~ - (`#4907 `__)(`#4911 `__) Addressed one aspect of bad scaling in enormous literal values (triggered by a list of 300,000 sample IDs) related to logging. - (`#4909 `__)(`#4914 `__) Fixed a check in Table/MatrixTable initialization that scaled O(n^2) with the total number of fields. .. _bug-fixes-106: Bug fixes ~~~~~~~~~ - (`#4754 `__)(`#4799 `__) Fixed optimizer assertion errors related to certain types of pipelines using ``group_rows_by``. - (`#4888 `__) Fixed assertion error in BlockMatrix.sum. - (`#4871 `__) Fixed possible error in locally sorting nested collections. - (`#4889 `__) Fixed break in compatibility with extremely old MatrixTable/Table files. - (`#4527 `__)(`#4761 `__) Fixed optimizer assertion error sometimes encountered with ``hl.split_multi[_hts]``. -------------- Version 0.2.4: Beginning of history! ------------------------------------ We didn’t start manually curating information about user-facing changes until version 0.2.4. The full commit history is available `here `__.