Types

Fields and expressions in Hail have types. Throughout the documentation, you will find type descriptions like array<str> or tlocus. It is generally more important to know how to use expressions of various types than to know how to manipulate the types themselves, but some operations like missing() require type arguments.

In Python, 5 is of type int while "hello" is of type str. Python is a dynamically-typed language, meaning that a function like:

>>> def add_x_and_y(x, y):
...     return x + y

…can be called on any two objects which can be added, like numbers, strings, or numpy arrays.

Types are very important in Hail, because the fields of Table and MatrixTable objects have data types.

Primitive types

Hail’s primitive data types for boolean, numeric and string objects are:

tint

Alias for tint32.

tint32

Hail type for signed 32-bit integers.

tint64

Hail type for signed 64-bit integers.

tfloat

Alias for tfloat64.

tfloat32

Hail type for 32-bit floating point numbers.

tfloat64

Hail type for 64-bit floating point numbers.

tstr

Hail type for text strings.

tbool

Hail type for Boolean (True or False) values.

Container types

Hail’s container types are:

  • tarray - Ordered collection of homogenous objects.

  • tndarray - Ordered n-dimensional arrays of homogenous objects.

  • tset - Unordered collection of distinct homogenous objects.

  • tdict - Key-value map. Keys and values are both homogenous.

  • ttuple - Tuple of heterogeneous values.

  • tstruct - Structure containing named fields, each with its own type.

tarray

Hail type for variable-length arrays of elements.

tndarray

Hail type for n-dimensional arrays.

tset

Hail type for collections of distinct elements.

tdict

Hail type for key-value maps.

ttuple

Hail type for tuples.

tinterval

Hail type for intervals of ordered values.

tstruct

Hail type for structured groups of heterogeneous fields.

Genetics types

Hail has two genetics-specific types:

tlocus

Hail type for a genomic coordinate with a contig and a position.

tcall

Hail type for a diploid genotype.

When to work with types

In general, you won’t need to mention types explicitly.

There are a few situations where you may want to specify types explicitly:

  • To specify column types in import_table() if the impute flag does not infer the type you want.

  • When converting a Python value to a Hail expression with literal(), if you don’t wish to rely on the inferred type.

  • With functions like missing() and empty_array().

Viewing an object’s type

Hail objects have a dtype field that will print their type.

>>> hl.rand_norm().dtype
dtype('float64')

Printing the representation of a Hail expression will also show the type:

>>> hl.rand_norm()
<Float64Expression of type float64>

We can see that hl.rand_norm() is of type tfloat64, but what does Expression mean? Each data type in Hail is represented by its own Expression class. Data of type tfloat64 is represented by an Float64Expression. Data of type tstruct is represented by a StructExpression.

Collection Types

Hail’s collection types (arrays, ndarrays, sets, and dicts) have homogenous elements, meaning that all values in the collection must be of the same type. Python allows mixed collections: ['1', 2, 3.0] is a valid Python list. However, Hail arrays cannot contain both tstr and tint32 values. Likewise, the dict {'a': 1, 2: 'b'} is a valid Python dictionary, but a Hail dictionary cannot contain keys of different types. An example of a valid dictionary in Hail is {'a': 1, 'b': 2}, where the keys are all strings and the values are all integers. The type of this dictionary would be dict<str, int32>.

Constructing types

Constructing types can be done either by using the type objects and classes (prefixed by “t”) or by parsing from strings with dtype(). As an example, we will construct a tstruct with each option:

>>> t = hl.tstruct(a = hl.tint32, b = hl.tstr, c = hl.tarray(hl.tfloat64))
>>> t
dtype('struct{a: int32, b: str, c: array<float64>}')

>>> t = hl.dtype('struct{a: int32, b: str, c: array<float64>}')
>>> t
dtype('struct{a: int32, b: str, c: array<float64>}')

Reference documentation

class hail.expr.types.HailType[source]

Hail type superclass.

hail.expr.types.dtype(type_str)[source]

Parse a type from its string representation.

Examples

>>> hl.dtype('int')
dtype('int32')
>>> hl.dtype('float')
dtype('float64')
>>> hl.dtype('array<int32>')
dtype('array<int32>')
>>> hl.dtype('dict<str, bool>')
dtype('dict<str, bool>')
>>> hl.dtype('struct{a: int32, `field with spaces`: int64}')
dtype('struct{a: int32, `field with spaces`: int64}')

Notes

This function is able to reverse str(t) on a HailType.

The grammar is defined as follows:

type = _ ( array / bool / call / dict / interval / int64 / int32 / float32 / float64 / locus / ndarray / rng_state / set / stream / struct / str / tuple / union / void / variable ) _
variable = "?" simple_identifier (":" simple_identifier)?
void = "void" / "tvoid"
int64 = "int64" / "tint64"
int32 = "int32" / "tint32" / "int" / "tint"
float32 = "float32" / "tfloat32"
float64 = "float64" / "tfloat64" / "tfloat" / "float"
bool = "tbool" / "bool"
call = "tcall" / "call"
str = "tstr" / "str"
locus = ("tlocus" / "locus") _ "<" identifier ">"
array = ("tarray" / "array") _ "<" type ">"
ndarray = ("tndarray" / "ndarray") _ "<" type "," nat ">"
set = ("tset" / "set") _ "<" type ">"
stream = ("tstream" / "stream") _ "<" type ">"
dict = ("tdict" / "dict") _ "<" type "," type ">"
struct = ("tstruct" / "struct") _ "{" (fields / _) "}"
union = ("tunion" / "union") _ "{" (fields / _) "}"
tuple = ("ttuple" / "tuple") _ "(" ((type ("," type)*) / _) ")"
fields = field ("," field)*
field = identifier ":" type
interval = ("tinterval" / "interval") _ "<" type ">"
identifier = _ (simple_identifier / escaped_identifier) _
simple_identifier = ~r"\w+"
escaped_identifier = ~"`([^`\\\\]|\\\\.)*`"
nat = _ (nat_literal / nat_variable) _
nat_literal = ~"[0-9]+"
nat_variable = "?nat"
rng_state = "rng_state"
_ = ~r"\s*"
Parameters:

type_str (str) – String representation of type.

Returns:

HailType

hail.expr.types.tint = dtype('int32')

Alias for tint32.

hail.expr.types.tint32 = dtype('int32')

Hail type for signed 32-bit integers.

Their values can range from \(-2^{31}\) to \(2^{31} - 1\) (approximately 2.15 billion).

In Python, these are represented as int.

hail.expr.types.tint64 = dtype('int64')

Hail type for signed 64-bit integers.

Their values can range from \(-2^{63}\) to \(2^{63} - 1\).

In Python, these are represented as int.

hail.expr.types.tfloat = dtype('float64')

Alias for tfloat64.

hail.expr.types.tfloat32 = dtype('float32')

Hail type for 32-bit floating point numbers.

In Python, these are represented as float.

hail.expr.types.tfloat64 = dtype('float64')

Hail type for 64-bit floating point numbers.

In Python, these are represented as float.

hail.expr.types.tstr = dtype('str')

Hail type for text strings.

In Python, these are represented as strings.

hail.expr.types.tbool = dtype('bool')

Hail type for Boolean (True or False) values.

In Python, these are represented as bool.

class hail.expr.types.tarray(element_type)[source]

Hail type for variable-length arrays of elements.

In Python, these are represented as list.

Notes

Arrays contain elements of only one type, which is parameterized by element_type.

Parameters:

element_type (HailType) – Element type of array.

class hail.expr.types.tndarray(element_type, ndim)[source]

Hail type for n-dimensional arrays.

Danger

This functionality is experimental. It may not be tested as well as other parts of Hail and the interface is subject to change.

In Python, these are represented as NumPy numpy.ndarray.

Notes

NDArrays contain elements of only one type, which is parameterized by element_type.

Parameters:
  • element_type (HailType) – Element type of array.

  • ndim (int32) – Number of dimensions.

class hail.expr.types.tset(element_type)[source]

Hail type for collections of distinct elements.

In Python, these are represented as set.

Notes

Sets contain elements of only one type, which is parameterized by element_type.

Parameters:

element_type (HailType) – Element type of set.

class hail.expr.types.tdict(key_type, value_type)[source]

Hail type for key-value maps.

In Python, these are represented as dict.

Notes

Dicts parameterize the type of both their keys and values with key_type and value_type.

Parameters:
class hail.expr.types.tstruct(**field_types)[source]

Hail type for structured groups of heterogeneous fields.

In Python, these are represented as Struct.

Hail’s tstruct type is commonly used to compose types together to form nested structures. Structs can contain any combination of types, and are ordered mappings from field name to field type. Each field name must be unique.

Structs are very common in Hail. Each component of a Table and MatrixTable is a struct:

Structs appear below the top-level component types as well. Consider the following join:

>>> new_table = table1.annotate(table2_fields = table2.index(table1.key))

This snippet adds a field to table1 called table2_fields. In the new table, table2_fields will be a struct containing all the non-key fields from table2.

Parameters:

field_types (keyword args of HailType) – Fields.

class hail.expr.types.ttuple(*types)[source]

Hail type for tuples.

In Python, these are represented as tuple.

Parameters:

types (varargs of HailType) – Element types.

See also

TupleExpression

hail.expr.types.tcall = dtype('call')

Hail type for a diploid genotype.

In Python, these are represented by Call.

class hail.expr.types.tlocus(reference_genome='default')[source]

Hail type for a genomic coordinate with a contig and a position.

In Python, these are represented by Locus.

Parameters:

reference_genome (ReferenceGenome or str) – Reference genome to use.

reference_genome

Reference genome.

Returns:

ReferenceGenome – Reference genome.

class hail.expr.types.tinterval(point_type)[source]

Hail type for intervals of ordered values.

In Python, these are represented by Interval.

Parameters:

point_type (HailType) – Interval point type.