Hail Query Python API

This is the API documentation for Hail Query, and provides detailed information on the Python programming interface.

Use import hail as hl to access this functionality.

Classes

`hail.table.BaseTable`	Base class for `Table`, `GroupedTable`, `hail.matrixtable.MatrixTable`, and `hail.matrixtable.GroupedMatrixTable`
`hail.Table`	Hail's distributed implementation of a dataframe or SQL table.
`hail.GroupedTable`	Table grouped by row that can be aggregated into a new table.
`hail.MatrixTable`	Hail's distributed implementation of a structured matrix.
`hail.GroupedMatrixTable`	Matrix table grouped by row or column that can be aggregated into a new matrix table.

Modules

Top-Level Functions

hail.init(sc=None, app_name=None, master=None, local='local[*]', log=None, quiet=False, append=False, min_block_size=0, branching_factor=50, tmp_dir=None, default_reference=None, idempotent=False, global_seed=None, spark_conf=None, skip_logging_configuration=False, local_tmpdir=None, _optimizer_iterations=None, *, backend=None, driver_cores=None, driver_memory=None, worker_cores=None, worker_memory=None, batch_id=None, gcs_requester_pays_configuration=None, regions=None, gcs_bucket_allow_list=None, copy_spark_log_on_error=False)[source]

Initialize and configure Hail.

This function will be called with default arguments if any Hail functionality is used. If you need custom configuration, you must explicitly call this function before using Hail. For example, to set the global random seed to 0, import Hail and immediately call init():

>>> import hail as hl
>>> hl.init(global_seed=0)  

Hail has two backends, spark and batch. Hail selects a backend by consulting, in order, these configuration locations:

The backend parameter of this function.
The HAIL_QUERY_BACKEND environment variable.
The value of hailctl config get query/backend.

If no configuration is found, Hail will select the Spark backend.

Examples

Configure Hail to use the Batch backend:

>>> import hail as hl
>>> hl.init(backend='batch')  

If a pyspark.SparkContext is already running, then Hail must be initialized with it as an argument:

>>> hl.init(sc=sc)  

Configure Hail to bill to my_project when accessing any Google Cloud Storage bucket that has requester pays enabled:

>>> hl.init(gcs_requester_pays_configuration='my-project')  

Configure Hail to bill to my_project when accessing the Google Cloud Storage buckets named bucket_of_fish and bucket_of_eels:

>>> hl.init(
...     gcs_requester_pays_configuration=('my-project', ['bucket_of_fish', 'bucket_of_eels'])
... )  

You may also use hailctl config set gcs_requester_pays/project and hailctl config set gcs_requester_pays/buckets to achieve the same effect.