StructExpression

class hail.expr.StructExpression[source]

Bases: collections.abc.Mapping, typing.Generic, hail.expr.expressions.base_expression.Expression

Expression of type tstruct.

>>> struct = hl.struct(a=5, b='Foo')

Struct fields are accessible as attributes and keys. It is therefore possible to access field a of struct s with dot syntax:

>>> hl.eval(struct.a)
5

However, it is recommended to use square brackets to select fields:

>>> hl.eval(struct['a'])
5

The latter syntax is safer, because fields that share their name with an existing attribute of StructExpression (keys, values, annotate, drop, etc.) will only be accessible using the StructExpression.__getitem__() syntax. This is also the only way to access fields that are not valid Python identifiers, like fields with spaces or symbols.

Attributes

dtype

The data type of the expression.

Methods

annotate

Add new fields or recompute existing fields.

drop

Drop fields from the struct.

flatten

rename

Rename fields of the struct.

select

Select existing fields and compute new ones.

__eq__(other)[source]

Return self==value.

__ge__(other)

Return self>=value.

__getitem__(item)[source]

Access a field of the struct by name or index.

Examples

>>> hl.eval(struct['a'])
5
>>> hl.eval(struct[1])
'Foo'
Parameters

item (str) – Field name.

Returns

Expression – Struct field.

__gt__(other)

Return self>value.

__le__(other)

Return self<=value.

__lt__(other)

Return self<value.

__ne__(other)[source]

Return self!=value.

annotate(**named_exprs)[source]

Add new fields or recompute existing fields.

Examples

>>> hl.eval(struct.annotate(a=10, c=2*2*2))
Struct(a=10, b='Foo', c=8)

Notes

If an expression in named_exprs shares a name with a field of the struct, then that field will be replaced but keep its position in the struct. New fields will be appended to the end of the struct.

Parameters

named_exprs (keyword args of Expression) – Fields to add.

Returns

StructExpression – Struct with new or updated fields.

collect(_localize=True)

Collect all records of an expression into a local list.

Examples

Collect all the values from C1:

>>> table1.C1.collect()
[2, 2, 10, 11]

Warning

Extremely experimental.

Warning

The list of records may be very large.

Returns

list

describe(handler=<built-in function print>)

Print information about type, index, and dependencies.

drop(*fields)[source]

Drop fields from the struct.

Examples

>>> hl.eval(struct.drop('b'))
Struct(a=5)
Parameters

fields (varargs of str) – Fields to drop.

Returns

StructExpression – Struct without certain fields.

property dtype

The data type of the expression.

Returns

HailType

export(path, delimiter='\t', missing='NA', header=True)

Export a field to a text file.

Examples

>>> small_mt.GT.export('output/gt.tsv')
>>> with open('output/gt.tsv', 'r') as f:
...     for line in f:
...         print(line, end='')
locus   alleles 0       1       2       3
1:1     ["A","C"]       0/1     0/1     0/0     0/0
1:2     ["A","C"]       1/1     0/1     1/1     1/1
1:3     ["A","C"]       1/1     0/1     0/1     0/0
1:4     ["A","C"]       1/1     0/1     1/1     1/1
>>> small_mt.GT.export('output/gt-no-header.tsv', header=False)
>>> with open('output/gt-no-header.tsv', 'r') as f:
...     for line in f:
...         print(line, end='')
1:1     ["A","C"]       0/1     0/1     0/0     0/0
1:2     ["A","C"]       1/1     0/1     1/1     1/1
1:3     ["A","C"]       1/1     0/1     0/1     0/0
1:4     ["A","C"]       1/1     0/1     1/1     1/1
>>> small_mt.pop.export('output/pops.tsv')
>>> with open('output/pops.tsv', 'r') as f:
...     for line in f:
...         print(line, end='')
sample_idx      pop
0       2
1       2
2       0
3       2
>>> small_mt.ancestral_af.export('output/ancestral_af.tsv')
>>> with open('output/ancestral_af.tsv', 'r') as f:
...     for line in f:
...         print(line, end='')
locus   alleles ancestral_af
1:1     ["A","C"]       5.3905e-01
1:2     ["A","C"]       8.6768e-01
1:3     ["A","C"]       4.3765e-01
1:4     ["A","C"]       7.6300e-01
>>> mt = small_mt
>>> small_mt.bn.export('output/bn.tsv')
>>> with open('output/bn.tsv', 'r') as f:
...     for line in f:
...         print(line, end='')
bn
{"n_populations":3,"n_samples":4,"n_variants":4,"n_partitions":8,"pop_dist":[1,1,1],"fst":[0.1,0.1,0.1],"mixture":false}

Notes

For entry-indexed expressions, if there is one column key field, the result of calling hl.str() on that field is used as the column header. Otherwise, each compound column key is converted to JSON and used as a column header. For example:

>>> small_mt = small_mt.key_cols_by(s=small_mt.sample_idx, family='fam1')
>>> small_mt.GT.export('output/gt-no-header.tsv')
>>> with open('output/gt-no-header.tsv', 'r') as f:
...     for line in f:
...         print(line, end='')
locus   alleles {"s":0,"family":"fam1"} {"s":1,"family":"fam1"} {"s":2,"family":"fam1"} {"s":3,"family":"fam1"}
1:1     ["A","C"]       0/1     0/1     0/0     0/0
1:2     ["A","C"]       1/1     0/1     1/1     1/1
1:3     ["A","C"]       1/1     0/1     0/1     0/0
1:4     ["A","C"]       1/1     0/1     1/1     1/1
Parameters
  • path (str) – The path to which to export.

  • delimiter (str) – The string for delimiting columns.

  • missing (str) – The string to output for missing values.

  • header (bool) – When True include a header line.

get(k[, d]) → D[k] if k in D, else d. d defaults to None.
items() → a set-like object providing a view on D’s items
keys() → a set-like object providing a view on D’s keys
rename(mapping)[source]

Rename fields of the struct.

Examples

>>> s = hl.struct(x='hello', y='goodbye', a='dogs')
>>> s.rename({'x' : 'y', 'y' : 'z'}).show()
+----------+----------+-----------+
| <expr>.a | <expr>.y | <expr>.z  |
+----------+----------+-----------+
| str      | str      | str       |
+----------+----------+-----------+
| "dogs"   | "hello"  | "goodbye" |
+----------+----------+-----------+
Parameters

mapping (dict of str, str) – Mapping from old field names to new field names.

Notes

Any field that does not appear as a key in mapping will not be renamed.

Returns

StructExpression – Struct with renamed fields.

select(*fields, **named_exprs)[source]

Select existing fields and compute new ones.

Examples

>>> hl.eval(struct.select('a', c=['bar', 'baz']))
Struct(a=5, c=['bar', 'baz'])

Notes

The fields argument is a list of field names to keep. These fields will appear in the resulting struct in the order they appear in fields.

The named_exprs arguments are new field expressions.

Parameters
  • fields (varargs of str) – Field names to keep.

  • named_exprs (keyword args of Expression) – New field expressions.

Returns

StructExpression – Struct containing specified existing fields and computed fields.

show(n=None, width=None, truncate=None, types=True, handler=None, n_rows=None, n_cols=None)

Print the first few rows of the table to the console.

Examples

>>> table1.SEX.show()
+-------+-----+
|    ID | SEX |
+-------+-----+
| int32 | str |
+-------+-----+
|     1 | "M" |
|     2 | "M" |
|     3 | "F" |
|     4 | "F" |
+-------+-----+
>>> hl.literal(123).show()
+--------+
| <expr> |
+--------+
|  int32 |
+--------+
|    123 |
+--------+

Warning

Extremely experimental.

Parameters
  • n (int) – Maximum number of rows to show.

  • width (int) – Horizontal width at which to break columns.

  • truncate (int, optional) – Truncate each field to the given number of characters. If None, truncate fields to the given width.

  • types (bool) – Print an extra header line with the type of each field.

summarize(handler=None)

Compute and print summary information about the expression.

Danger

This functionality is experimental. It may not be tested as well as other parts of Hail and the interface is subject to change.

take(n, _localize=True)

Collect the first n records of an expression.

Examples

Take the first three rows:

>>> table1.X.take(3)
[5, 6, 7]

Warning

Extremely experimental.

Parameters

n (int) – Number of records to take.

Returns

list

values() → an object providing a view on D’s values