Expressions

Expression

Base class for Hail expressions.

ArrayExpression

Expression of type tarray.

ArrayNumericExpression

Expression of type tarray with a numeric type.

BooleanExpression

Expression of type tbool.

CallExpression

Expression of type tcall.

CollectionExpression

Expression of type tarray or tset

DictExpression

Expression of type tdict.

IntervalExpression

Expression of type tinterval.

LocusExpression

Expression of type tlocus.

NumericExpression

Expression of numeric type.

Int32Expression

Expression of type tint32.

Int64Expression

Expression of type tint64.

Float32Expression

Expression of type tfloat32.

Float64Expression

Expression of type tfloat64.

SetExpression

Expression of type tset.

StringExpression

Expression of type tstr.

StructExpression

Expression of type tstruct.

class hail.expr.expressions.Expression(ir: hail.ir.base_ir.IR, type: hail.expr.types.HailType, indices: hail.expr.expressions.indices.Indices = Indices(axes=set(), source=None), aggregations: hail.utils.linkedlist.LinkedList = List())[source]

Base class for Hail expressions.

__eq__(other)[source]

Returns True if the two expressions are equal.

Examples

>>> x = hl.literal(5)
>>> y = hl.literal(5)
>>> z = hl.literal(1)
>>> hl.eval(x == y)
True
>>> hl.eval(x == z)
False

Notes

This method will fail with an error if the two expressions are not of comparable types.

Parameters

other (Expression) – Expression for equality comparison.

Returns

BooleanExpressionTrue if the two expressions are equal.

__ne__(other)[source]

Returns True if the two expressions are not equal.

Examples

>>> x = hl.literal(5)
>>> y = hl.literal(5)
>>> z = hl.literal(1)
>>> hl.eval(x != y)
False
>>> hl.eval(x != z)
True

Notes

This method will fail with an error if the two expressions are not of comparable types.

Parameters

other (Expression) – Expression for inequality comparison.

Returns

BooleanExpressionTrue if the two expressions are not equal.

collect(_localize=True)[source]

Collect all records of an expression into a local list.

Examples

Collect all the values from C1:

>>> table1.C1.collect()
[2, 2, 10, 11]

Warning

Extremely experimental.

Warning

The list of records may be very large.

Returns

list

describe(handler=<built-in function print>)[source]

Print information about type, index, and dependencies.

dtype

The data type of the expression.

Returns

HailType

show(n=None, width=None, truncate=None, types=True, handler=None, n_rows=None, n_cols=None)[source]

Print the first few rows of the table to the console.

Examples

>>> table1.SEX.show()
+-------+-----+
|    ID | SEX |
+-------+-----+
| int32 | str |
+-------+-----+
|     1 | "M" |
|     2 | "M" |
|     3 | "F" |
|     4 | "F" |
+-------+-----+
>>> hl.literal(123).show()
+--------+
| <expr> |
+--------+
|  int32 |
+--------+
|    123 |
+--------+

Warning

Extremely experimental.

Parameters
  • n (int) – Maximum number of rows to show.

  • width (int) – Horizontal width at which to break columns.

  • truncate (int, optional) – Truncate each field to the given number of characters. If None, truncate fields to the given width.

  • types (bool) – Print an extra header line with the type of each field.

summarize()[source]

Compute and print summary information about the expression.

Danger

This functionality is experimental. It may not be tested as well as other parts of Hail and the interface is subject to change.

take(n, _localize=True)[source]

Collect the first n records of an expression.

Examples

Take the first three rows:

>>> table1.X.take(3)
[5, 6, 7]

Warning

Extremely experimental.

Parameters

n (int) – Number of records to take.

Returns

list

class hail.expr.expressions.ArrayExpression(ir: hail.ir.base_ir.IR, type: hail.expr.types.HailType, indices: hail.expr.expressions.indices.Indices = Indices(axes=set(), source=None), aggregations: hail.utils.linkedlist.LinkedList = List())[source]

Bases: hail.expr.expressions.typed_expressions.CollectionExpression

Expression of type tarray.

>>> names = hl.literal(['Alice', 'Bob', 'Charlie'])
__getitem__(item)[source]

Index into or slice the array.

Examples

Index with a single integer:

>>> hl.eval(names[1])
'Bob'
>>> hl.eval(names[-1])
'Charlie'

Slicing is also supported:

>>> hl.eval(names[1:])
['Bob', 'Charlie']
Parameters

item (slice or Expression of type tint32) – Index or slice.

Returns

Expression – Element or array slice.

append(item)[source]

Append an element to the array and return the result.

Examples

>>> hl.eval(names.append('Dan'))
['Alice', 'Bob', 'Charlie', 'Dan']

Note

This method does not mutate the caller, but instead returns a new array by copying the caller and adding item.

Parameters

item (Expression) – Element to append, same type as the array element type.

Returns

ArrayExpression

contains(item)[source]

Returns a boolean indicating whether item is found in the array.

Examples

>>> hl.eval(names.contains('Charlie'))
True
>>> hl.eval(names.contains('Helen'))
False
Parameters

item (Expression) – Item for inclusion test.

Warning

This method takes time proportional to the length of the array. If a pipeline uses this method on the same array several times, it may be more efficient to convert the array to a set first early in the script (set()).

Returns

BooleanExpressionTrue if the element is found in the array, False otherwise.

extend(a)[source]

Concatenate two arrays and return the result.

Examples

>>> hl.eval(names.extend(['Dan', 'Edith']))
['Alice', 'Bob', 'Charlie', 'Dan', 'Edith']
Parameters

a (ArrayExpression) – Array to concatenate, same type as the callee.

Returns

ArrayExpression

head()[source]

Returns the first element of the array, or missing if empty.

Returns

Expression – Element.

Examples

>>> hl.eval(names.head())
'Alice'

If the array has no elements, then the result is missing: >>> hl.eval(names.filter(lambda x: x.startswith(‘D’)).head()) None

index(x)[source]

Returns the first index of x, or missing.

Parameters

x (Expression or Callable) – Value to find, or function from element to Boolean expression.

Returns

Int32Expression

Examples

>>> hl.eval(names.index('Bob'))
1
>>> hl.eval(names.index('Beth'))
None
>>> hl.eval(names.index(lambda x: x.endswith('e')))
0
>>> hl.eval(names.index(lambda x: x.endswith('h')))
None
scan(f, zero)[source]

Map each element of the array to cumulative value of function f, with initial value zero.

Examples

>>> a = [0, 1, 2]
>>> hl.eval(hl.array_scan(lambda i, j: i + j, 0, a))
[0, 0, 1, 3]
Parameters
  • f (function ( (Expression, Expression) -> Expression)) – Function which takes the cumulative value and the next element, and returns a new value.

  • zero (Expression) – Initial value to pass in as left argument of f.

Returns

ArrayExpression.

class hail.expr.expressions.ArrayNumericExpression(ir: hail.ir.base_ir.IR, type: hail.expr.types.HailType, indices: hail.expr.expressions.indices.Indices = Indices(axes=set(), source=None), aggregations: hail.utils.linkedlist.LinkedList = List())[source]

Bases: hail.expr.expressions.typed_expressions.ArrayExpression

Expression of type tarray with a numeric type.

Numeric arrays support arithmetic both with scalar values and other arrays. Arithmetic between two numeric arrays requires that the length of each array is identical, and will apply the operation positionally (a1 * a2 will multiply the first element of a1 by the first element of a2, the second element of a1 by the second element of a2, and so on). Arithmetic with a scalar will apply the operation to each element of the array.

>>> a1 = hl.literal([0, 1, 2, 3, 4, 5])
>>> a2 = hl.literal([1, -1, 1, -1, 1, -1])
__add__(other)[source]

Positionally add an array or a scalar.

Examples

>>> hl.eval(a1 + 5)
[5, 6, 7, 8, 9, 10]
>>> hl.eval(a1 + a2)
[1, 0, 3, 2, 5, 4]
Parameters

other (NumericExpression or ArrayNumericExpression) – Value or array to add.

Returns

ArrayNumericExpression – Array of positional sums.

__floordiv__(other)[source]

Positionally divide by an array or a scalar using floor division.

Examples

>>> hl.eval(a1 // 2)
[0, 0, 1, 1, 2, 2]
Parameters

other (NumericExpression or ArrayNumericExpression)

Returns

ArrayNumericExpression

__mod__(other)[source]

Positionally compute the left modulo the right.

Examples

>>> hl.eval(a1 % 2)
[0, 1, 0, 1, 0, 1]
Parameters

other (NumericExpression or ArrayNumericExpression)

Returns

ArrayNumericExpression

__mul__(other)[source]

Positionally multiply by an array or a scalar.

Examples

>>> hl.eval(a2 * 5)
[5, -5, 5, -5, 5, -5]
>>> hl.eval(a1 * a2)
[0, -1, 2, -3, 4, -5]
Parameters

other (NumericExpression or ArrayNumericExpression) – Value or array to multiply by.

Returns

ArrayNumericExpression – Array of positional products.

__neg__()[source]

Negate elements of the array.

Examples

>>> hl.eval(-a1)
[0, -1, -2, -3, -4, -5]
Returns

ArrayNumericExpression – Array expression of the same type.

__pow__(other)[source]

Positionally raise to the power of an array or a scalar.

Examples

>>> hl.eval(a1 ** 2)
[0.0, 1.0, 4.0, 9.0, 16.0, 25.0]
>>> hl.eval(a1 ** a2)
[0.0, 1.0, 2.0, 0.3333333333333333, 4.0, 0.2]
Parameters

other (NumericExpression or ArrayNumericExpression)

Returns

ArrayNumericExpression

__sub__(other)[source]

Positionally subtract an array or a scalar.

Examples

>>> hl.eval(a2 - 1)
[0, -2, 0, -2, 0, -2]
>>> hl.eval(a1 - a2)
[-1, 2, 1, 4, 3, 6]
Parameters

other (NumericExpression or ArrayNumericExpression) – Value or array to subtract.

Returns

ArrayNumericExpression – Array of positional differences.

class hail.expr.expressions.BooleanExpression(ir: hail.ir.base_ir.IR, type: hail.expr.types.HailType, indices: hail.expr.expressions.indices.Indices = Indices(axes=set(), source=None), aggregations: hail.utils.linkedlist.LinkedList = List())[source]

Bases: hail.expr.expressions.typed_expressions.NumericExpression

Expression of type tbool.

>>> t = hl.literal(True)
>>> f = hl.literal(False)
>>> na = hl.null(hl.tbool)
>>> hl.eval(t)
True
>>> hl.eval(f)
False
>>> hl.eval(na)
None
__and__(other)[source]

Return True if the left and right arguments are True.

Examples

>>> hl.eval(t & f)
False
>>> hl.eval(t & na)
None
>>> hl.eval(f & na)
False

The & and | operators have higher priority than comparison operators like ==, <, or >. Parentheses are often necessary:

>>> x = hl.literal(5)
>>> hl.eval((x < 10) & (x > 2))
True
Parameters

other (BooleanExpression) – Right-side operand.

Returns

BooleanExpressionTrue if both left and right are True.

__invert__()[source]

Return the boolean negation.

Examples

>>> hl.eval(~t)
False
>>> hl.eval(~f)
True
>>> hl.eval(~na)
None
Returns

BooleanExpression – Boolean negation.

__or__(other)[source]

Return True if at least one of the left and right arguments is True.

Examples

>>> hl.eval(t | f)
True
>>> hl.eval(t | na)
True
>>> hl.eval(f | na)
None

The & and | operators have higher priority than comparison operators like ==, <, or >. Parentheses are often necessary:

>>> x = hl.literal(5)
>>> hl.eval((x < 10) | (x > 20))
True
Parameters

other (BooleanExpression) – Right-side operand.

Returns

BooleanExpressionTrue if either left or right is True.

class hail.expr.expressions.CallExpression(ir: hail.ir.base_ir.IR, type: hail.expr.types.HailType, indices: hail.expr.expressions.indices.Indices = Indices(axes=set(), source=None), aggregations: hail.utils.linkedlist.LinkedList = List())[source]

Bases: hail.expr.expressions.base_expression.Expression

Expression of type tcall.

>>> call = hl.call(0, 1, phased=False)
__getitem__(item)[source]

Get the i*th* allele.

Examples

Index with a single integer:

>>> hl.eval(call[0])
0
>>> hl.eval(call[1])
1
Parameters

item (int or Expression of type tint32) – Allele index.

Returns

Expression of type tint32

is_diploid()[source]

True if the call has ploidy equal to 2.

Examples

>>> hl.eval(call.is_diploid())
True
Returns

BooleanExpression

is_haploid()[source]

True if the call has ploidy equal to 1.

Examples

>>> hl.eval(call.is_haploid())
False
Returns

BooleanExpression

is_het()[source]

Evaluate whether the call includes two different alleles.

Examples

>>> hl.eval(call.is_het())
True

Notes

In the diploid biallelic case, a 0/1 call will return True, and 0/0 and 1/1 will return False.

Returns

BooleanExpressionTrue if the two alleles are different, False if they are the same.

is_het_non_ref()[source]

Evaluate whether the call includes two different alleles, neither of which is reference.

Examples

>>> hl.eval(call.is_het_non_ref())
False

Notes

A biallelic variant may never have a het-non-ref call. Examples of these calls are 1/2 and 2/4.

Returns

BooleanExpressionTrue if the call includes two different alternate alleles, False otherwise.

is_het_ref()[source]

Evaluate whether the call includes two different alleles, one of which is reference.

Examples

>>> hl.eval(call.is_het_ref())
True
Returns

BooleanExpressionTrue if the call includes one reference and one alternate allele, False otherwise.

is_hom_ref()[source]

Evaluate whether the call includes two reference alleles.

Examples

>>> hl.eval(call.is_hom_ref())
False
Returns

BooleanExpressionTrue if the call includes two reference alleles, False otherwise.

is_hom_var()[source]

Evaluate whether the call includes two identical alternate alleles.

Examples

>>> hl.eval(call.is_hom_var())
False
Returns

BooleanExpressionTrue if the call includes two identical alternate alleles, False otherwise.

is_non_ref()[source]

Evaluate whether the call includes one or more non-reference alleles.

Examples

>>> hl.eval(call.is_non_ref())
True

Notes

In the diploid biallelic case, a 0/0 call will return False, and 0/1 and 1/1 will return True.

Returns

BooleanExpressionTrue if at least one allele is non-reference, False otherwise.

n_alt_alleles()[source]

Returns the number of non-reference alleles.

Examples

>>> hl.eval(call.n_alt_alleles())
1

Notes

For diploid biallelic calls, this method is equivalent to the alternate allele dosage. For instance, 0/0 will return 0, 0/1 will return 1, and 1/1 will return 2.

Returns

Expression of type tint32 – The number of non-reference alleles.

one_hot_alleles(alleles)[source]

Returns an array containing the summed one-hot encoding of the alleles.

Examples

>>> hl.eval(call.one_hot_alleles(['A', 'T']))
[1, 1]

This one-hot representation is the positional sum of the one-hot encoding for each called allele. For a biallelic variant, the one-hot encoding for a reference allele is [1, 0] and the one-hot encoding for an alternate allele is [0, 1]. Diploid calls would produce the following arrays: [2, 0] for homozygous reference, [1, 1] for heterozygous, and [0, 2] for homozygous alternate.

Parameters

alleles (ArrayStringExpression) – Variant alleles.

Returns

ArrayInt32Expression – An array of summed one-hot encodings of allele indices.

phased

True if the call is phased.

Examples

>>> hl.eval(call.phased)
False
Returns

BooleanExpression

ploidy

Return the number of alleles of this call.

Examples

>>> hl.eval(call.ploidy)
2

Notes

Currently only ploidy 1 and 2 are supported.

Returns

Expression of type tint32

unphased_diploid_gt_index()[source]

Return the genotype index for unphased, diploid calls.

Examples

>>> hl.eval(call.unphased_diploid_gt_index())
1
Returns

Expression of type tint32

class hail.expr.expressions.CollectionExpression(ir: hail.ir.base_ir.IR, type: hail.expr.types.HailType, indices: hail.expr.expressions.indices.Indices = Indices(axes=set(), source=None), aggregations: hail.utils.linkedlist.LinkedList = List())[source]

Bases: hail.expr.expressions.base_expression.Expression

Expression of type tarray or tset

>>> a = hl.literal([1, 2, 3, 4, 5])
>>> s3 = hl.literal({'Alice', 'Bob', 'Charlie'})
all(f)[source]

Returns True if f returns True for every element.

Examples

>>> hl.eval(a.all(lambda x: x < 10))
True

Notes

This method returns True if the collection is empty.

Parameters

f (function ( (arg) -> BooleanExpression)) – Function to evaluate for each element of the collection. Must return a BooleanExpression.

Returns

BooleanExpression. – True if f returns True for every element, False otherwise.

any(f)[source]

Returns True if f returns True for any element.

Examples

>>> hl.eval(a.any(lambda x: x % 2 == 0))
True
>>> hl.eval(s3.any(lambda x: x[0] == 'D'))
False

Notes

This method always returns False for empty collections.

Parameters

f (function ( (arg) -> BooleanExpression)) – Function to evaluate for each element of the collection. Must return a BooleanExpression.

Returns

BooleanExpression. – True if f returns True for any element, False otherwise.

filter(f)[source]

Returns a new collection containing elements where f returns True.

Examples

>>> hl.eval(a.filter(lambda x: x % 2 == 0))
[2, 4]
>>> hl.eval(s3.filter(lambda x: ~(x[-1] == 'e')))  # doctest: +NOTEST
{'Bob'}

Notes

Returns a same-type expression; evaluated on a SetExpression, returns a SetExpression. Evaluated on an ArrayExpression, returns an ArrayExpression.

Parameters

f (function ( (arg) -> BooleanExpression)) – Function to evaluate for each element of the collection. Must return a BooleanExpression.

Returns

CollectionExpression – Expression of the same type as the callee.

find(f)[source]

Returns the first element where f returns True.

Examples

>>> hl.eval(a.find(lambda x: x ** 2 > 20))
5
>>> hl.eval(s3.find(lambda x: x[0] == 'D'))
None

Notes

If f returns False for every element, then the result is missing.

Parameters

f (function ( (arg) -> BooleanExpression)) – Function to evaluate for each element of the collection. Must return a BooleanExpression.

Returns

Expression – Expression whose type is the element type of the collection.

flatmap(f)[source]

Map each element of the collection to a new collection, and flatten the results.

Examples

>>> hl.eval(a.flatmap(lambda x: hl.range(0, x)))
[0, 0, 1, 0, 1, 2, 0, 1, 2, 3, 0, 1, 2, 3, 4]
>>> hl.eval(s3.flatmap(lambda x: hl.set(hl.range(0, x.length()).map(lambda i: x[i]))))  # doctest: +NOTEST
{'A', 'B', 'C', 'a', 'b', 'c', 'e', 'h', 'i', 'l', 'o', 'r'}
Parameters

f (function ( (arg) -> CollectionExpression)) – Function from the element type of the collection to the type of the collection. For instance, flatmap on a set<str> should take a str and return a set.

Returns

CollectionExpression

fold(f, zero)[source]

Reduces the collection with the given function f, provided the initial value zero.

Examples

>>> a = [0, 1, 2]
>>> hl.eval(hl.fold(lambda i, j: i + j, 0, a))
3
Parameters
  • f (function ( (Expression, Expression) -> Expression)) – Function which takes the cumulative value and the next element, and returns a new value.

  • zero (Expression) – Initial value to pass in as left argument of f.

Returns

Expression.

group_by(f)[source]

Group elements into a dict according to a lambda function.

Examples

>>> hl.eval(a.group_by(lambda x: x % 2 == 0))  # doctest: +NOTEST
{False: [1, 3, 5], True: [2, 4]}
>>> hl.eval(s3.group_by(lambda x: x.length()))  # doctest: +NOTEST
{3: {'Bob'}, 5: {'Alice'}, 7: {'Charlie'}}
Parameters

f (function ( (arg) -> Expression)) – Function to evaluate for each element of the collection to produce a key for the resulting dictionary.

Returns

DictExpression. – Dictionary keyed by results of f.

length()[source]

Returns the size of a collection.

Examples

>>> hl.eval(a.length())
5
>>> hl.eval(s3.length())
3
Returns

Expression of type tint32 – The number of elements in the collection.

map(f)[source]

Transform each element of a collection.

Examples

>>> hl.eval(a.map(lambda x: x ** 3))
[1.0, 8.0, 27.0, 64.0, 125.0]
>>> hl.eval(s3.map(lambda x: x.length()))
{3, 5, 7}
Parameters

f (function ( (arg) -> Expression)) – Function to transform each element of the collection.

Returns

CollectionExpression. – Collection where each element has been transformed according to f.

size()[source]

Returns the size of a collection.

Examples

>>> hl.eval(a.size())
5
>>> hl.eval(s3.size())
3
Returns

Expression of type tint32 – The number of elements in the collection.

class hail.expr.expressions.DictExpression(ir, type, indices=Indices(axes=set(), source=None), aggregations=List())[source]

Bases: hail.expr.expressions.base_expression.Expression

Expression of type tdict.

>>> d = hl.literal({'Alice': 43, 'Bob': 33, 'Charles': 44})
__getitem__(item)[source]

Get the value associated with key item.

Examples

>>> hl.eval(d['Alice'])
43

Notes

Raises an error if item is not a key of the dictionary. Use DictExpression.get() to return missing instead of an error.

Parameters

item (Expression) – Key expression.

Returns

Expression – Value associated with key item.

contains(item)[source]

Returns whether a given key is present in the dictionary.

Examples

>>> hl.eval(d.contains('Alice'))
True
>>> hl.eval(d.contains('Anne'))
False
Parameters

item (Expression) – Key to test for inclusion.

Returns

BooleanExpressionTrue if item is a key of the dictionary, False otherwise.

get(item, default=None)[source]

Returns the value associated with key k or a default value if that key is not present.

Examples

>>> hl.eval(d.get('Alice'))
43
>>> hl.eval(d.get('Anne'))
None
>>> hl.eval(d.get('Anne', 0))
0
Parameters
  • item (Expression) – Key.

  • default (Expression) – Default value. Must be same type as dictionary values.

Returns

Expression – The value associated with item, or default.

key_set()[source]

Returns the set of keys in the dictionary.

Examples

>>> hl.eval(d.key_set())  # doctest: +NOTEST
{'Alice', 'Bob', 'Charles'}
Returns

SetExpression – Set of all keys.

keys()[source]

Returns an array with all keys in the dictionary.

Examples

>>> hl.eval(d.keys())  # doctest: +NOTEST
['Bob', 'Charles', 'Alice']
Returns

ArrayExpression – Array of all keys.

map_values(f)[source]

Transform values of the dictionary according to a function.

Examples

>>> hl.eval(d.map_values(lambda x: x * 10))  # doctest: +NOTEST
{'Alice': 430, 'Bob': 330, 'Charles': 440}
Parameters

f (function ( (arg) -> Expression)) – Function to apply to each value.

Returns

DictExpression – Dictionary with transformed values.

size()[source]

Returns the size of the dictionary.

Examples

>>> hl.eval(d.size())
3
Returns

Expression of type tint32 – Size of the dictionary.

values()[source]

Returns an array with all values in the dictionary.

Examples

>>> hl.eval(d.values())  # doctest: +NOTEST
[33, 44, 43]
Returns

ArrayExpression – All values in the dictionary.

class hail.expr.expressions.IntervalExpression(ir: hail.ir.base_ir.IR, type: hail.expr.types.HailType, indices: hail.expr.expressions.indices.Indices = Indices(axes=set(), source=None), aggregations: hail.utils.linkedlist.LinkedList = List())[source]

Bases: hail.expr.expressions.base_expression.Expression

Expression of type tinterval.

>>> interval = hl.interval(3, 11)
>>> locus_interval = hl.parse_locus_interval("1:53242-90543")
contains(value)[source]

Tests whether a value is contained in the interval.

Examples

>>> hl.eval(interval.contains(3))
True
>>> hl.eval(interval.contains(11))
False
Parameters

value – Object with type matching the interval point type.

Returns

BooleanExpressionTrue if value is contained in the interval, False otherwise.

end

Returns the end point.

Examples

>>> hl.eval(interval.end)
11
Returns

Expression

includes_end

True if the interval includes the end point.

Examples

>>> hl.eval(interval.includes_end)
False
Returns

BooleanExpression

includes_start

True if the interval includes the start point.

Examples

>>> hl.eval(interval.includes_start)
True
Returns

BooleanExpression

overlaps(interval)[source]

True if the the supplied interval contains any value in common with this one.

Examples

>>> hl.eval(interval.overlaps(hl.interval(5, 9)))
True
>>> hl.eval(interval.overlaps(hl.interval(11, 20)))
False
Parameters

interval (Expression with type tinterval) – Interval object with the same point type.

Returns

BooleanExpression

start

Returns the start point.

Examples

>>> hl.eval(interval.start)
3
Returns

Expression

class hail.expr.expressions.LocusExpression(ir: hail.ir.base_ir.IR, type: hail.expr.types.HailType, indices: hail.expr.expressions.indices.Indices = Indices(axes=set(), source=None), aggregations: hail.utils.linkedlist.LinkedList = List())[source]

Bases: hail.expr.expressions.base_expression.Expression

Expression of type tlocus.

>>> locus = hl.locus('1', 1034245)
contig

Returns the chromosome.

Examples

>>> hl.eval(locus.contig)
'1'
Returns

StringExpression – The chromosome for this locus.

global_position()[source]

Returns a zero-indexed absolute position along the reference genome.

The global position is computed as position - 1 plus the sum of the lengths of all the contigs that precede this locus’s contig in the reference genome’s ordering of contigs.

See also locus_from_global_position().

Examples

A locus with position 1 along chromosome 1 will have a global position of 0 along the reference genome GRCh37.

>>> hl.eval(hl.locus('1', 1).global_position())
0

A locus with position 1 along chromosome 2 will have a global position of (1-1) + 249250621, where 249250621 is the length of chromosome 1 on GRCh37.

>>> hl.eval(hl.locus('2', 1).global_position())
249250621

A different reference genome than the default results in a different global position.

>>> hl.eval(hl.locus('chr2', 1, 'GRCh38').global_position())
248956422
Returns

Expression of type tint64 – Global base position of locus along the reference genome.

in_autosome()[source]

Returns True if the locus is on an autosome.

Notes

All contigs are considered autosomal except those designated as X, Y, or MT by ReferenceGenome.

Examples

>>> hl.eval(locus.in_autosome())
True
Returns

BooleanExpression

in_autosome_or_par()[source]

Returns True if the locus is on an autosome or a pseudoautosomal region of chromosome X or Y.

Examples

>>> hl.eval(locus.in_autosome_or_par())
True
Returns

BooleanExpression

in_mito()[source]

Returns True if the locus is on mitochondrial DNA.

Examples

>>> hl.eval(locus.in_mito())
False
Returns

BooleanExpression

in_x_nonpar()[source]

Returns True if the locus is in a non-pseudoautosomal region of chromosome X.

Examples

>>> hl.eval(locus.in_x_nonpar())
False
Returns

BooleanExpression

in_x_par()[source]

Returns True if the locus is in a pseudoautosomal region of chromosome X.

Examples

>>> hl.eval(locus.in_x_par())
False
Returns

BooleanExpression

in_y_nonpar()[source]

Returns True if the locus is in a non-pseudoautosomal region of chromosome Y.

Examples

>>> hl.eval(locus.in_y_nonpar())
False

Note

Many variant callers only generate variants on chromosome X for the pseudoautosomal region. In this case, all loci mapped to chromosome Y are non-pseudoautosomal.

Returns

BooleanExpression

in_y_par()[source]

Returns True if the locus is in a pseudoautosomal region of chromosome Y.

Examples

>>> hl.eval(locus.in_y_par())
False

Note

Many variant callers only generate variants on chromosome X for the pseudoautosomal region. In this case, all loci mapped to chromosome Y are non-pseudoautosomal.

Returns

BooleanExpression

position

Returns the position along the chromosome.

Examples

>>> hl.eval(locus.position)
1034245
Returns

Expression of type tint32 – This locus’s position along its chromosome.

sequence_context(before=0, after=0)[source]

Return the reference genome sequence at the locus.

Examples

Get the reference allele at a locus:

>>> hl.eval(locus.sequence_context()) # doctest: +SKIP
"G"

Get the reference sequence at a locus including the previous 5 bases:

>>> hl.eval(locus.sequence_context(before=5)) # doctest: +SKIP
"ACTCGG"

Notes

This function requires that this locus’ reference genome has an attached reference sequence. Use ReferenceGenome.add_sequence() to load and attach a reference sequence to a reference genome.

Parameters
  • before (Expression of type tint32, optional) – Number of bases to include before the locus. Truncates at contig boundary.

  • after (Expression of type tint32, optional) – Number of bases to include after the locus. Truncates at contig boundary.

Returns

StringExpression

class hail.expr.expressions.NumericExpression(ir: hail.ir.base_ir.IR, type: hail.expr.types.HailType, indices: hail.expr.expressions.indices.Indices = Indices(axes=set(), source=None), aggregations: hail.utils.linkedlist.LinkedList = List())[source]

Bases: hail.expr.expressions.base_expression.Expression

Expression of numeric type.

>>> x = hl.literal(3)
>>> y = hl.literal(4.5)
__add__(other)[source]

Add two numbers.

Examples

>>> hl.eval(x + 2)
5
>>> hl.eval(x + y)
7.5
Parameters

other (NumericExpression) – Number to add.

Returns

NumericExpression – Sum of the two numbers.

__floordiv__(other)[source]

Divide two numbers with floor division.

Examples

>>> hl.eval(x // 2)
1
>>> hl.eval(y // 2)
2.0
Parameters

other (NumericExpression) – Dividend.

Returns

NumericExpression – The floor of the left number divided by the right.

__ge__(other)[source]

Greater-than-or-equals comparison.

Examples

>>> hl.eval(y >= 4)
True
Parameters

other (NumericExpression) – Right side for comparison.

Returns

BooleanExpressionTrue if the left side is greater than or equal to the right side.

__gt__(other)[source]

Greater-than comparison.

Examples

>>> hl.eval(y > 4)
True
Parameters

other (NumericExpression) – Right side for comparison.

Returns

BooleanExpressionTrue if the left side is greater than the right side.

__le__(other)[source]

Less-than-or-equals comparison.

Examples

>>> hl.eval(x <= 3)
True
Parameters

other (NumericExpression) – Right side for comparison.

Returns

BooleanExpressionTrue if the left side is smaller than or equal to the right side.

__lt__(other)[source]

Less-than comparison.

Examples

>>> hl.eval(x < 5)
True
Parameters

other (NumericExpression) – Right side for comparison.

Returns

BooleanExpressionTrue if the left side is smaller than the right side.

__mod__(other)[source]

Compute the left modulo the right number.

Examples

>>> hl.eval(32 % x)
2
>>> hl.eval(7 % y)
2.5
Parameters

other (NumericExpression) – Dividend.

Returns

NumericExpression – Remainder after dividing the left by the right.

__mul__(other)[source]

Multiply two numbers.

Examples

>>> hl.eval(x * 2)
6
>>> hl.eval(x * y)
13.5
Parameters

other (NumericExpression) – Number to multiply.

Returns

NumericExpression – Product of the two numbers.

__neg__()[source]

Negate the number (multiply by -1).

Examples

>>> hl.eval(-x)
-3
Returns

NumericExpression – Negated number.

__pow__(power, modulo=None)[source]

Raise the left to the right power.

Examples

>>> hl.eval(x ** 2)
9.0
>>> hl.eval(x ** -2)
0.1111111111111111
>>> hl.eval(y ** 1.5)
9.545941546018392
Parameters
Returns

Expression of type tfloat64 – Result of raising left to the right power.

__sub__(other)[source]

Subtract the right number from the left.

Examples

>>> hl.eval(x - 2)
1
>>> hl.eval(x - y)
-1.5
Parameters

other (NumericExpression) – Number to subtract.

Returns

NumericExpression – Difference of the two numbers.

class hail.expr.expressions.Int32Expression(ir: hail.ir.base_ir.IR, type: hail.expr.types.HailType, indices: hail.expr.expressions.indices.Indices = Indices(axes=set(), source=None), aggregations: hail.utils.linkedlist.LinkedList = List())[source]

Bases: hail.expr.expressions.typed_expressions.NumericExpression

Expression of type tint32.

class hail.expr.expressions.Int64Expression(ir: hail.ir.base_ir.IR, type: hail.expr.types.HailType, indices: hail.expr.expressions.indices.Indices = Indices(axes=set(), source=None), aggregations: hail.utils.linkedlist.LinkedList = List())[source]

Bases: hail.expr.expressions.typed_expressions.NumericExpression

Expression of type tint64.

class hail.expr.expressions.Float32Expression(ir: hail.ir.base_ir.IR, type: hail.expr.types.HailType, indices: hail.expr.expressions.indices.Indices = Indices(axes=set(), source=None), aggregations: hail.utils.linkedlist.LinkedList = List())[source]

Bases: hail.expr.expressions.typed_expressions.NumericExpression

Expression of type tfloat32.

class hail.expr.expressions.Float64Expression(ir: hail.ir.base_ir.IR, type: hail.expr.types.HailType, indices: hail.expr.expressions.indices.Indices = Indices(axes=set(), source=None), aggregations: hail.utils.linkedlist.LinkedList = List())[source]

Bases: hail.expr.expressions.typed_expressions.NumericExpression

Expression of type tfloat64.

class hail.expr.expressions.SetExpression(ir, type, indices=Indices(axes=set(), source=None), aggregations=List())[source]

Bases: hail.expr.expressions.typed_expressions.CollectionExpression

Expression of type tset.

>>> s1 = hl.literal({1, 2, 3})
>>> s2 = hl.literal({1, 3, 5})
add(item)[source]

Returns a new set including item.

Examples

>>> hl.eval(s1.add(10))  # doctest: +NOTEST
{1, 2, 3, 10}
Parameters

item (Expression) – Value to add.

Returns

SetExpression – Set with item added.

contains(item)[source]

Returns True if item is in the set.

Examples

>>> hl.eval(s1.contains(1))
True
>>> hl.eval(s1.contains(10))
False
Parameters

item (Expression) – Value for inclusion test.

Returns

BooleanExpressionTrue if item is in the set.

difference(s)[source]

Return the set of elements in the set that are not present in set s.

Examples

>>> hl.eval(s1.difference(s2))
{2}
>>> hl.eval(s2.difference(s1))
{5}
Parameters

s (SetExpression) – Set expression of the same type.

Returns

SetExpression – Set of elements not in s.

intersection(s)[source]

Return the intersection of the set and set s.

Examples

>>> hl.eval(s1.intersection(s2))
{1, 3}
Parameters

s (SetExpression) – Set expression of the same type.

Returns

SetExpression – Set of elements present in s.

is_subset(s)[source]

Returns True if every element is contained in set s.

Examples

>>> hl.eval(s1.is_subset(s2))
False
>>> hl.eval(s1.remove(2).is_subset(s2))
True
Parameters

s (SetExpression) – Set expression of the same type.

Returns

BooleanExpressionTrue if every element is contained in set s.

remove(item)[source]

Returns a new set excluding item.

Examples

>>> hl.eval(s1.remove(1))
{2, 3}
Parameters

item (Expression) – Value to remove.

Returns

SetExpression – Set with item removed.

union(s)[source]

Return the union of the set and set s.

Examples

>>> hl.eval(s1.union(s2))
{1, 2, 3, 5}
Parameters

s (SetExpression) – Set expression of the same type.

Returns

SetExpression – Set of elements present in either set.

class hail.expr.expressions.StringExpression(ir: hail.ir.base_ir.IR, type: hail.expr.types.HailType, indices: hail.expr.expressions.indices.Indices = Indices(axes=set(), source=None), aggregations: hail.utils.linkedlist.LinkedList = List())[source]

Bases: hail.expr.expressions.base_expression.Expression

Expression of type tstr.

>>> s = hl.literal('The quick brown fox')
__add__(other)[source]

Concatenate strings.

Examples

>>> hl.eval(s + ' jumped over the lazy dog')
'The quick brown fox jumped over the lazy dog'
Parameters

other (StringExpression) – String to concatenate.

Returns

StringExpression – Concatenated string.

__getitem__(item)[source]

Slice or index into the string.

Examples

>>> hl.eval(s[:15])
'The quick brown'
>>> hl.eval(s[0])
'T'
Parameters

item (slice or Expression of type tint32) – Slice or character index.

Returns

StringExpression – Substring or character at index item.

contains(substr)[source]

Returns whether substr is contained in the string.

Examples

>>> hl.eval(s.contains('fox'))
True
>>> hl.eval(s.contains('dog'))
False

Note

This method is case-sensitive.

Parameters

substr (StringExpression)

Returns

BooleanExpression

endswith(substr)[source]

Returns whether substr is a suffix of the string.

Examples

>>> hl.eval(s.endswith('fox'))
True

Note

This method is case-sensitive.

Parameters

substr (StringExpression)

Returns

StringExpression

first_match_in(regex)[source]

Returns an array containing the capture groups of the first match of regex in the given character sequence.

Examples

>>> hl.eval(s.first_match_in("The quick (\w+) fox"))
['brown']
>>> hl.eval(s.first_match_in("The (\w+) (\w+) (\w+)"))
['quick', 'brown', 'fox']
>>> hl.eval(s.first_match_in("(\w+) (\w+)"))
['The', 'quick']
Parameters

regex (StringExpression)

Returns

ArrayExpression with element type tstr

length()[source]

Returns the length of the string.

Examples

>>> hl.eval(s.length())
19
Returns

Expression of type tint32 – Length of the string.

lower()[source]

Returns a copy of the string, but with upper case letters converted to lower case.

Examples

>>> hl.eval(s.lower())
'the quick brown fox'
Returns

StringExpression

matches(regex)[source]

Returns True if the string contains any match for the given regex.

Examples

>>> string = hl.literal('NA12878')

The regex parameter does not need to match the entire string:

>>> hl.eval(string.matches('12'))
True

Regex motifs can be used to match sequences of characters:

>>> hl.eval(string.matches(r'NA\d+'))
True

Notes

The regex argument is a regular expression, and uses Java regex syntax.

Parameters

regex (str) – Pattern to match.

Returns

BooleanExpressionTrue if the string contains any match for the regex, otherwise False.

replace(pattern1, pattern2)[source]

Replace substrings matching pattern1 with pattern2 using regex.

Examples

Replace spaces with underscores in a Hail string:

>>> hl.eval(hl.str("The quick  brown fox").replace(' ', '_'))
'The_quick__brown_fox'

Remove the leading zero in contigs in variant strings in a table:

>>> t = hl.import_table('data/leading-zero-variants.txt')
>>> t.show()
+----------------+
| variant        |
+----------------+
| str            |
+----------------+
| "01:1000:A:T"  |
| "01:10001:T:G" |
| "02:99:A:C"    |
| "02:893:G:C"   |
| "22:100:A:T"   |
| "X:10:C:A"     |
+----------------+
<BLANKLINE>
>>> t = t.annotate(variant = t.variant.replace("^0([0-9])", "$1"))
>>> t.show()
+---------------+
| variant       |
+---------------+
| str           |
+---------------+
| "1:1000:A:T"  |
| "1:10001:T:G" |
| "2:99:A:C"    |
| "2:893:G:C"   |
| "22:100:A:T"  |
| "X:10:C:A"    |
+---------------+
<BLANKLINE>

Notes

The regex expressions used should follow Java regex syntax. In the Java regular expression syntax, a dollar sign, $1, refers to the first group, not the canonical \1.

Parameters
split(delim, n=None)[source]

Returns an array of strings generated by splitting the string at delim.

Examples

>>> hl.eval(s.split('\s+'))
['The', 'quick', 'brown', 'fox']
>>> hl.eval(s.split('\s+', 2))
['The', 'quick brown fox']

Notes

The delimiter is a regex using the Java regex syntax delimiter. To split on special characters, escape them with double backslash (\\).

Parameters
Returns

ArrayExpression – Array of split strings.

startswith(substr)[source]

Returns whether substr is a prefix of the string.

Examples

>>> hl.eval(s.startswith('The'))
True
>>> hl.eval(s.startswith('the'))
False

Note

This method is case-sensitive.

Parameters

substr (StringExpression)

Returns

StringExpression

strip()[source]

Returns a copy of the string with whitespace removed from the start and end.

Examples

>>> s2 = hl.str('  once upon a time\n')
>>> hl.eval(s2.strip())
'once upon a time'
Returns

StringExpression

upper()[source]

Returns a copy of the string, but with lower case letters converted to upper case.

Examples

>>> hl.eval(s.upper())
'THE QUICK BROWN FOX'
Returns

StringExpression

class hail.expr.expressions.StructExpression(ir, type, indices=Indices(axes=set(), source=None), aggregations=List())[source]

Bases: typing.Mapping, hail.expr.expressions.base_expression.Expression

Expression of type tstruct.

>>> struct = hl.struct(a=5, b='Foo')

Struct fields are accessible as attributes and keys. It is therefore possible to access field a of struct s with dot syntax:

>>> hl.eval(struct.a)
5

However, it is recommended to use square brackets to select fields:

>>> hl.eval(struct['a'])
5

The latter syntax is safer, because fields that share their name with an existing attribute of StructExpression (keys, values, annotate, drop, etc.) will only be accessible using the StructExpression.__getitem__() syntax. This is also the only way to access fields that are not valid Python identifiers, like fields with spaces or symbols.

__getitem__(item)[source]

Access a field of the struct by name or index.

Examples

>>> hl.eval(struct['a'])
5
>>> hl.eval(struct[1])
'Foo'
Parameters

item (str) – Field name.

Returns

Expression – Struct field.

annotate(**named_exprs)[source]

Add new fields or recompute existing fields.

Examples

>>> hl.eval(struct.annotate(a=10, c=2*2*2))
Struct(a=10, b='Foo', c=8)

Notes

If an expression in named_exprs shares a name with a field of the struct, then that field will be replaced but keep its position in the struct. New fields will be appended to the end of the struct.

Parameters

named_exprs (keyword args of Expression) – Fields to add.

Returns

StructExpression – Struct with new or updated fields.

drop(*fields)[source]

Drop fields from the struct.

Examples

>>> hl.eval(struct.drop('b'))
Struct(a=5)
Parameters

fields (varargs of str) – Fields to drop.

Returns

StructExpression – Struct without certain fields.

flatten()[source]
select(*fields, **named_exprs)[source]

Select existing fields and compute new ones.

Examples

>>> hl.eval(struct.select('a', c=['bar', 'baz']))
Struct(a=5, c=['bar', 'baz'])

Notes

The fields argument is a list of field names to keep. These fields will appear in the resulting struct in the order they appear in fields.

The named_exprs arguments are new field expressions.

Parameters
  • fields (varargs of str) – Field names to keep.

  • named_exprs (keyword args of Expression) – New field expressions.

Returns

StructExpression – Struct containing specified existing fields and computed fields.