ArrayExpression

class hail.expr.ArrayExpression[source]

Bases: hail.expr.expressions.typed_expressions.CollectionExpression

Expression of type tarray.

>>> names = hl.literal(['Alice', 'Bob', 'Charlie'])

Attributes

dtype The data type of the expression.

Methods

__init__ Initialize self.
all Returns True if f returns True for every element.
any Returns True if f returns True for any element.
append Append an element to the array and return the result.
collect Collect all records of an expression into a local list.
contains Returns a boolean indicating whether item is found in the array.
describe Print information about type, index, and dependencies.
export Export a field to a text file.
extend Concatenate two arrays and return the result.
filter Returns a new collection containing elements where f returns True.
find Returns the first element where f returns True.
flatmap Map each element of the collection to a new collection, and flatten the results.
fold Reduces the collection with the given function f, provided the initial value zero.
group_by Group elements into a dict according to a lambda function.
head Returns the first element of the array, or missing if empty.
index Returns the first index of x, or missing.
length Returns the size of a collection.
map Transform each element of a collection.
scan Map each element of the array to cumulative value of function f, with initial value zero.
show Print the first few rows of the table to the console.
size Returns the size of a collection.
summarize Compute and print summary information about the expression.
take Collect the first n records of an expression.
__eq__(other)

Returns True if the two expressions are equal.

Examples

>>> x = hl.literal(5)
>>> y = hl.literal(5)
>>> z = hl.literal(1)
>>> hl.eval(x == y)
True
>>> hl.eval(x == z)
False

Notes

This method will fail with an error if the two expressions are not of comparable types.

Parameters:other (Expression) – Expression for equality comparison.
Returns:BooleanExpressionTrue if the two expressions are equal.
__ge__(other)

Return self>=value.

__getitem__(item)[source]

Index into or slice the array.

Examples

Index with a single integer:

>>> hl.eval(names[1])
'Bob'
>>> hl.eval(names[-1])
'Charlie'

Slicing is also supported:

>>> hl.eval(names[1:])
['Bob', 'Charlie']
Parameters:item (slice or Expression of type tint32) – Index or slice.
Returns:Expression – Element or array slice.
__gt__(other)

Return self>value.

__le__(other)

Return self<=value.

__lt__(other)

Return self<value.

__ne__(other)

Returns True if the two expressions are not equal.

Examples

>>> x = hl.literal(5)
>>> y = hl.literal(5)
>>> z = hl.literal(1)
>>> hl.eval(x != y)
False
>>> hl.eval(x != z)
True

Notes

This method will fail with an error if the two expressions are not of comparable types.

Parameters:other (Expression) – Expression for inequality comparison.
Returns:BooleanExpressionTrue if the two expressions are not equal.
all(f)

Returns True if f returns True for every element.

Examples

>>> hl.eval(a.all(lambda x: x < 10))
True

Notes

This method returns True if the collection is empty.

Parameters:f (function ( (arg) -> BooleanExpression)) – Function to evaluate for each element of the collection. Must return a BooleanExpression.
Returns:BooleanExpression. – True if f returns True for every element, False otherwise.
any(f)

Returns True if f returns True for any element.

Examples

>>> hl.eval(a.any(lambda x: x % 2 == 0))
True
>>> hl.eval(s3.any(lambda x: x[0] == 'D'))
False

Notes

This method always returns False for empty collections.

Parameters:f (function ( (arg) -> BooleanExpression)) – Function to evaluate for each element of the collection. Must return a BooleanExpression.
Returns:BooleanExpression. – True if f returns True for any element, False otherwise.
append(item)[source]

Append an element to the array and return the result.

Examples

>>> hl.eval(names.append('Dan'))
['Alice', 'Bob', 'Charlie', 'Dan']

Note

This method does not mutate the caller, but instead returns a new array by copying the caller and adding item.

Parameters:item (Expression) – Element to append, same type as the array element type.
Returns:ArrayExpression
collect(_localize=True)

Collect all records of an expression into a local list.

Examples

Collect all the values from C1:

>>> table1.C1.collect()
[2, 2, 10, 11]

Warning

Extremely experimental.

Warning

The list of records may be very large.

Returns:list
contains(item)[source]

Returns a boolean indicating whether item is found in the array.

Examples

>>> hl.eval(names.contains('Charlie'))
True
>>> hl.eval(names.contains('Helen'))
False
Parameters:item (Expression) – Item for inclusion test.

Warning

This method takes time proportional to the length of the array. If a pipeline uses this method on the same array several times, it may be more efficient to convert the array to a set first early in the script (set()).

Returns:BooleanExpressionTrue if the element is found in the array, False otherwise.
describe(handler=<built-in function print>)

Print information about type, index, and dependencies.

dtype

The data type of the expression.

Returns:HailType
export(path, delimiter='\t', missing='NA', header=True)

Export a field to a text file.

Examples

>>> small_mt.GT.export('output/gt.tsv')
>>> with open('output/gt.tsv', 'r') as f:
...     for line in f:
...         print(line, end='')
locus   alleles 0       1       2       3
1:1     ["A","C"]       0/1     0/1     0/0     0/0
1:2     ["A","C"]       1/1     0/1     1/1     1/1
1:3     ["A","C"]       1/1     0/1     0/1     0/0
1:4     ["A","C"]       1/1     0/1     1/1     1/1
>>> small_mt.GT.export('output/gt-no-header.tsv', header=False)
>>> with open('output/gt-no-header.tsv', 'r') as f:
...     for line in f:
...         print(line, end='')
1:1     ["A","C"]       0/1     0/1     0/0     0/0
1:2     ["A","C"]       1/1     0/1     1/1     1/1
1:3     ["A","C"]       1/1     0/1     0/1     0/0
1:4     ["A","C"]       1/1     0/1     1/1     1/1
>>> small_mt.pop.export('output/pops.tsv')
>>> with open('output/pops.tsv', 'r') as f:
...     for line in f:
...         print(line, end='')
sample_idx      pop
0       2
1       2
2       0
3       2
>>> small_mt.ancestral_af.export('output/ancestral_af.tsv')
>>> with open('output/ancestral_af.tsv', 'r') as f:
...     for line in f:
...         print(line, end='')
locus   alleles ancestral_af
1:1     ["A","C"]       5.3905e-01
1:2     ["A","C"]       8.6768e-01
1:3     ["A","C"]       4.3765e-01
1:4     ["A","C"]       7.6300e-01
>>> mt = small_mt
>>> small_mt.bn.export('output/bn.tsv')
>>> with open('output/bn.tsv', 'r') as f:
...     for line in f:
...         print(line, end='')
bn
{"n_populations":3,"n_samples":4,"n_variants":4,"n_partitions":8,"pop_dist":[1,1,1],"fst":[0.1,0.1,0.1],"mixture":false}

Notes

For entry-indexed expressions, if there is one column key field, the result of calling hl.str() on that field is used as the column header. Otherwise, each compound column key is converted to JSON and used as a column header. For example:

>>> small_mt = small_mt.key_cols_by(s=small_mt.sample_idx, family='fam1')
>>> small_mt.GT.export('output/gt-no-header.tsv')
>>> with open('output/gt-no-header.tsv', 'r') as f:
...     for line in f:
...         print(line, end='')
locus   alleles {"s":0,"family":"fam1"} {"s":1,"family":"fam1"} {"s":2,"family":"fam1"} {"s":3,"family":"fam1"}
1:1     ["A","C"]       0/1     0/1     0/0     0/0
1:2     ["A","C"]       1/1     0/1     1/1     1/1
1:3     ["A","C"]       1/1     0/1     0/1     0/0
1:4     ["A","C"]       1/1     0/1     1/1     1/1
Parameters:
  • path (str) – The path to which to export.
  • delimiter (str) – The string for delimiting columns.
  • missing (str) – The string to output for missing values.
  • header (bool) – When True include a header line.
extend(a)[source]

Concatenate two arrays and return the result.

Examples

>>> hl.eval(names.extend(['Dan', 'Edith']))
['Alice', 'Bob', 'Charlie', 'Dan', 'Edith']
Parameters:a (ArrayExpression) – Array to concatenate, same type as the callee.
Returns:ArrayExpression
filter(f)

Returns a new collection containing elements where f returns True.

Examples

>>> hl.eval(a.filter(lambda x: x % 2 == 0))
[2, 4]
>>> hl.eval(s3.filter(lambda x: ~(x[-1] == 'e')))  
{'Bob'}

Notes

Returns a same-type expression; evaluated on a SetExpression, returns a SetExpression. Evaluated on an ArrayExpression, returns an ArrayExpression.

Parameters:f (function ( (arg) -> BooleanExpression)) – Function to evaluate for each element of the collection. Must return a BooleanExpression.
Returns:CollectionExpression – Expression of the same type as the callee.
find(f)

Returns the first element where f returns True.

Examples

>>> hl.eval(a.find(lambda x: x ** 2 > 20))
5
>>> hl.eval(s3.find(lambda x: x[0] == 'D'))
None

Notes

If f returns False for every element, then the result is missing.

Parameters:f (function ( (arg) -> BooleanExpression)) – Function to evaluate for each element of the collection. Must return a BooleanExpression.
Returns:Expression – Expression whose type is the element type of the collection.
flatmap(f)

Map each element of the collection to a new collection, and flatten the results.

Examples

>>> hl.eval(a.flatmap(lambda x: hl.range(0, x)))
[0, 0, 1, 0, 1, 2, 0, 1, 2, 3, 0, 1, 2, 3, 4]
>>> hl.eval(s3.flatmap(lambda x: hl.set(hl.range(0, x.length()).map(lambda i: x[i]))))  
{'A', 'B', 'C', 'a', 'b', 'c', 'e', 'h', 'i', 'l', 'o', 'r'}
Parameters:f (function ( (arg) -> CollectionExpression)) – Function from the element type of the collection to the type of the collection. For instance, flatmap on a set<str> should take a str and return a set.
Returns:CollectionExpression
fold(f, zero)

Reduces the collection with the given function f, provided the initial value zero.

Examples

>>> a = [0, 1, 2]
>>> hl.eval(hl.fold(lambda i, j: i + j, 0, a))
3
Parameters:
  • f (function ( (Expression, Expression) -> Expression)) – Function which takes the cumulative value and the next element, and returns a new value.
  • zero (Expression) – Initial value to pass in as left argument of f.
Returns:

Expression.

group_by(f)

Group elements into a dict according to a lambda function.

Examples

>>> hl.eval(a.group_by(lambda x: x % 2 == 0))  
{False: [1, 3, 5], True: [2, 4]}
>>> hl.eval(s3.group_by(lambda x: x.length()))  
{3: {'Bob'}, 5: {'Alice'}, 7: {'Charlie'}}
Parameters:f (function ( (arg) -> Expression)) – Function to evaluate for each element of the collection to produce a key for the resulting dictionary.
Returns:DictExpression. – Dictionary keyed by results of f.
head()[source]

Returns the first element of the array, or missing if empty.

Returns:Expression – Element.

Examples

>>> hl.eval(names.head())
'Alice'

If the array has no elements, then the result is missing: >>> hl.eval(names.filter(lambda x: x.startswith(‘D’)).head()) None

index(x)[source]

Returns the first index of x, or missing.

Parameters:x (Expression or Callable) – Value to find, or function from element to Boolean expression.
Returns:Int32Expression

Examples

>>> hl.eval(names.index('Bob'))
1
>>> hl.eval(names.index('Beth'))
None
>>> hl.eval(names.index(lambda x: x.endswith('e')))
0
>>> hl.eval(names.index(lambda x: x.endswith('h')))
None
length()

Returns the size of a collection.

Examples

>>> hl.eval(a.length())
5
>>> hl.eval(s3.length())
3
Returns:Expression of type tint32 – The number of elements in the collection.
map(f)

Transform each element of a collection.

Examples

>>> hl.eval(a.map(lambda x: x ** 3))
[1.0, 8.0, 27.0, 64.0, 125.0]
>>> hl.eval(s3.map(lambda x: x.length()))
{3, 5, 7}
Parameters:f (function ( (arg) -> Expression)) – Function to transform each element of the collection.
Returns:CollectionExpression. – Collection where each element has been transformed according to f.
scan(f, zero)[source]

Map each element of the array to cumulative value of function f, with initial value zero.

Examples

>>> a = [0, 1, 2]
>>> hl.eval(hl.array_scan(lambda i, j: i + j, 0, a))
[0, 0, 1, 3]
Parameters:
  • f (function ( (Expression, Expression) -> Expression)) – Function which takes the cumulative value and the next element, and returns a new value.
  • zero (Expression) – Initial value to pass in as left argument of f.
Returns:

ArrayExpression.

show(n=None, width=None, truncate=None, types=True, handler=None, n_rows=None, n_cols=None)

Print the first few rows of the table to the console.

Examples

>>> table1.SEX.show()
+-------+-----+
|    ID | SEX |
+-------+-----+
| int32 | str |
+-------+-----+
|     1 | "M" |
|     2 | "M" |
|     3 | "F" |
|     4 | "F" |
+-------+-----+
>>> hl.literal(123).show()
+--------+
| <expr> |
+--------+
|  int32 |
+--------+
|    123 |
+--------+

Warning

Extremely experimental.

Parameters:
  • n (int) – Maximum number of rows to show.
  • width (int) – Horizontal width at which to break columns.
  • truncate (int, optional) – Truncate each field to the given number of characters. If None, truncate fields to the given width.
  • types (bool) – Print an extra header line with the type of each field.
size()

Returns the size of a collection.

Examples

>>> hl.eval(a.size())
5
>>> hl.eval(s3.size())
3
Returns:Expression of type tint32 – The number of elements in the collection.
summarize(handler=None)

Compute and print summary information about the expression.

Danger

This functionality is experimental. It may not be tested as well as other parts of Hail and the interface is subject to change.

take(n, _localize=True)

Collect the first n records of an expression.

Examples

Take the first three rows:

>>> table1.X.take(3)
[5, 6, 7]

Warning

Extremely experimental.

Parameters:n (int) – Number of records to take.
Returns:list