Change Log

0.2.14

Released 2019-04-24

A back-incompatible patch update to PySpark, 2.4.2, has broken fresh pip installs of Hail 0.2.13. To fix this, either downgrade PySpark to 2.4.1 or upgrade to the latest version of Hail.

New features

  • (#5915) Added hl.cite_hail and hl.cite_hail_bibtex functions to generate appropriate citations.
  • (#5872) Fixed hl.init when the idempotent parameter is True.

0.2.13

Released 2019-04-18

Hail is now using Spark 2.4.x by default. If you build hail from source, you will need to acquire this version of Spark and update your build invocations accordingly.

New features

  • (#5828) Remove dependency on htsjdk for VCF INFO parsing, enabling faster import of some VCFs.
  • (#5860) Improve performance of some column annotation pipelines.
  • (#5858) Add unify option to Table.union which allows unification of tables with different fields or field orderings.
  • (#5799) mt.entries() is four times faster.
  • (#5756) Hail now uses Spark 2.4.x by default.
  • (#5677) MatrixTable now also supports show.
  • (#5793)(#5701) Add array.index(x) which find the first index of array whose value is equal to x.
  • (#5790) Add array.head() which returns the first element of the array, or missing if the array is empty.
  • (#5690) Improve performance of ld_matrix.
  • (#5743) mt.compute_entry_filter_stats computes statistics about the number of filtered entries in a matrix table.
  • (#5758) failure to parse an interval will now produce a much more detailed error message.
  • (#5723) hl.import_matrix_table can now import a matrix table with no columns.
  • (#5724) hl.rand_norm2d samples from a two dimensional random normal.

Bug fixes

  • (#5885) Fix Table.to_spark in the presence of fields of tuples.
  • (#5882)(#5886) Fix BlockMatrix conversion methods to correctly handle filtered entries.
  • (#5884)(#4874) Fix longstanding crash when reading Hail data files under certain conditions.
  • (#5855)(#5786) Fix hl.mendel_errors incorrectly reporting children counts in the presence of entry filtering.
  • (#5830)(#5835) Fix Nirvana support
  • (#5773) Fix hl.sample_qc to use correct number of total rows when calculating call rate.
  • (#5763)(#5764) Fix hl.agg.array_agg to work inside mt.annotate_rows and similar functions.
  • (#5770) Hail now uses the correct unicode string encoding which resolves a number of issues when a Table or MatrixTable has a key field containing unicode characters.
  • (#5692) When keyed is True, hl.maximal_independent_set now does not produce duplicates.
  • (#5725) Docs now consistently refer to hl.agg not agg.
  • (#5730)(#5782) Taught import_bgen to optimize its variants argument.

Experimental

  • (#5732) The hl.agg.approx_quantiles aggregate computes an approximation of the quantiles of an expression.
  • (#5693)(#5396) Table._multi_way_zip_join now correctly handles keys that have been truncated.

0.2.12

Released 2019-03-28

New features

  • (#5614) Add support for multiple missing values in hl.import_table.
  • (#5666) Produce HTML table output for Table.show() when running in Jupyter notebook.

Bug fixes

  • (#5603)(#5697) Fixed issue where min_partitions on hl.import_table was non-functional.
  • (#5611) Fix hl.nirvana crash.

Experimental

  • (#5524) Add summarize functions to Table, MatrixTable, and Expression.
  • (#5570) Add hl.agg.approx_cdf aggregator for approximate density calculation.
  • (#5571) Add log parameter to hl.plot.histogram.
  • (#5601) Add hl.plot.joint_plot, extend functionality of hl.plot.scatter.
  • (#5608) Add LD score simulation framework.
  • (#5628) Add hl.experimental.full_outer_join_mt for full outer joins on MatrixTables.

0.2.11

Released 2019-03-06

New features

  • (#5374) Add default arguments to hl.add_sequence for running on GCP.
  • (#5481) Added sample_cols method to MatrixTable.
  • (#5501) Exposed MatrixTable.unfilter_entries. See filter_entries documentation for more information.
  • (#5480) Added n_cols argument to MatrixTable.head.
  • (#5529) Added Table.{semi_join, anti_join} and MatrixTable.{semi_join_rows, semi_join_cols, anti_join_rows, anti_join_cols}.
  • (#5528) Added {MatrixTable, Table}.checkpoint methods as wrappers around write / read_{matrix_table, table}.

Bug fixes

  • (#5416) Resolved issue wherein VEP and certain regressions were recomputed on each use, rather than once.
  • (#5419) Resolved issue with import_vcf force_bgz and file size checks.
  • (#5427) Resolved issue with Table.show and dictionary field types.
  • (#5468) Resolved ordering problem with Expression.show on key fields that are not the first key.
  • (#5492) Fixed hl.agg.collect crashing when collecting float32 values.
  • (#5525) Fixed hl.trio_matrix crashing when complete_trios is False.

0.2.10

Released 2019-02-15

New features

  • (#5272) Added a new ‘delimiter’ option to Table.export.
  • (#5251) Add utility aliases to hl.plot for output_notebook and show.
  • (#5249) Add histogram2d function to hl.plot module.
  • (#5247) Expose MatrixTable.localize_entries method for converting to a Table with an entries array.
  • (#5300) Add new filter and find_replace arguments to hl.import_table and hl.import_vcf to apply regex and substitutions to text input.

Performance improvements

  • (#5298) Reduce size of exported VCF files by exporting missing genotypes without trailing fields.

Bug fixes

  • (#5306) Fix ReferenceGenome.add_sequence causing a crash.
  • (#5268) Fix Table.export writing a file called ‘None’ in the current directory.
  • (#5265) Fix hl.get_reference raising an exception when called before hl.init().
  • (#5250) Fix crash in pc_relate when called on a MatrixTable field other than ‘GT’.
  • (#5278) Fix crash in Table.order_by when sorting by fields whose names are not valid Python identifiers.
  • (#5294) Fix crash in hl.trio_matrix when sample IDs are missing.
  • (#5295) Fix crash in Table.index related to key field incompatibilities.

0.2.9

Released 2019-01-30

New features

  • (#5149) Added bitwise transformation functions: hl.bit_{and, or, xor, not, lshift, rshift}.
  • (#5154) Added hl.rbind function, which is similar to hl.bind but expects a function as the last argument instead of the first.

Performance improvements

  • (#5107) Hail’s Python interface generates tighter intermediate code, which should result in moderate performance improvements in many pipelines.
  • (#5172) Fix unintentional performance deoptimization related to Table.show introduced in 0.2.8.
  • (#5078) Improve performance of hl.ld_prune by up to 30x.

Bug fixes

  • (#5144) Fix crash caused by hl.index_bgen (since 0.2.7)
  • (#5177) Fix bug causing Table.repartition(n, shuffle=True) to fail to increase partitioning for unkeyed tables.
  • (#5173) Fix bug causing Table.show to throw an error when the table is empty (since 0.2.8).
  • (#5210) Fix bug causing Table.show to always print types, regardless of types argument (since 0.2.8).
  • (#5211) Fix bug causing MatrixTable.make_table to unintentionally discard non-key row fields (since 0.2.8).

0.2.8

Released 2019-01-15

New features

  • (#5072) Added multi-phenotype option to hl.logistic_regression_rows
  • (#5077) Added support for importing VCF floating-point FORMAT fields as float32 as well as float64.

Performance improvements

  • (#5068) Improved optimization of MatrixTable.count_cols.
  • (#5131) Fixed performance bug related to hl.literal on large values with missingness

Bug fixes

  • (#5088) Fixed name separator in MatrixTable.make_table.
  • (#5104) Fixed optimizer bug related to experimental functionality.
  • (#5122) Fixed error constructing Table or MatrixTable objects with fields with certain character patterns like $.

0.2.7

Released 2019-01-03

New features

  • (#5046)(experimental) Added option to BlockMatrix.export_rectangles to export as NumPy-compatible binary.

Performance improvements

  • (#5050) Short-circuit iteration in logistic_regression_rows and poisson_regression_rows if NaNs appear.

0.2.6

Released 2018-12-17

New features

  • (#4962) Expanded comparison operators (==, !=, <, <=, >, >=) to support expressions of every type.
  • (#4927) Expanded functionality of Table.order_by to support ordering by arbitrary expressions, instead of just top-level fields.
  • (#4926) Expanded default GRCh38 contig recoding behavior in import_plink.

Performance improvements

  • (#4952) Resolved lingering issues related to (#4909).

Bug fixes

  • (#4941) Fixed variable scoping error in regression methods.
  • (#4857) Fixed bug in maximal_independent_set appearing when nodes were named something other than i and j.
  • (#4932) Fixed possible error in export_plink related to tolerance of writer process failure.
  • (#4920) Fixed bad error message in Table.order_by.

0.2.5

Released 2018-12-07

New features

  • (#4845) The or_error method in hl.case and hl.switch statements now takes a string expression rather than a string literal, allowing more informative messages for errors and assertions.
  • (#4865) We use this new or_error functionality in methods that require biallelic variants to include an offending variant in the error message.
  • (#4820) Added hl.reversed for reversing arrays and strings.
  • (#4895) Added include_strand option to the hl.liftover function.

Performance improvements

  • (#4907)(#4911) Addressed one aspect of bad scaling in enormous literal values (triggered by a list of 300,000 sample IDs) related to logging.
  • (#4909)(#4914) Fixed a check in Table/MatrixTable initialization that scaled O(n^2) with the total number of fields.

Bug fixes

  • (#4754)(#4799) Fixed optimizer assertion errors related to certain types of pipelines using group_rows_by.
  • (#4888) Fixed assertion error in BlockMatrix.sum.
  • (#4871) Fixed possible error in locally sorting nested collections.
  • (#4889) Fixed break in compatibility with extremely old MatrixTable/Table files.
  • (#4527)(#4761) Fixed optimizer assertion error sometimes encountered with hl.split_multi[_hts].

0.2.4: Beginning of history!

We didn’t start manually curating information about user-facing changes until version 0.2.4.

The full commit history is available here.