Types
Fields and expressions in Hail have types. Throughout the documentation, you
will find type descriptions like array<str>
or tlocus
. It is
generally more important to know how to use expressions of various types than to
know how to manipulate the types themselves, but some operations like
missing()
require type arguments.
In Python, 5
is of type int
while "hello"
is of type str
.
Python is a dynamically-typed language, meaning that a function like:
>>> def add_x_and_y(x, y):
... return x + y
…can be called on any two objects which can be added, like numbers, strings, or
numpy
arrays.
Types are very important in Hail, because the fields of Table
and
MatrixTable
objects have data types.
Primitive types
Hail’s primitive data types for boolean, numeric and string objects are:
Alias for |
|
Hail type for signed 32-bit integers. |
|
Hail type for signed 64-bit integers. |
|
Alias for |
|
Hail type for 32-bit floating point numbers. |
|
Hail type for 64-bit floating point numbers. |
|
Hail type for text strings. |
|
Hail type for Boolean ( |
Container types
Hail’s container types are:
tarray
- Ordered collection of homogenous objects.
tndarray
- Ordered n-dimensional arrays of homogenous objects.
tset
- Unordered collection of distinct homogenous objects.
tdict
- Key-value map. Keys and values are both homogenous.
ttuple
- Tuple of heterogeneous values.
tstruct
- Structure containing named fields, each with its own type.
Hail type for variable-length arrays of elements. |
|
Hail type for n-dimensional arrays. |
|
Hail type for collections of distinct elements. |
|
Hail type for key-value maps. |
|
Hail type for tuples. |
|
Hail type for intervals of ordered values. |
|
Hail type for structured groups of heterogeneous fields. |
Genetics types
Hail has two genetics-specific types:
Hail type for a genomic coordinate with a contig and a position. |
|
Hail type for a diploid genotype. |
When to work with types
In general, you won’t need to mention types explicitly.
There are a few situations where you may want to specify types explicitly:
To specify column types in
import_table()
if the impute flag does not infer the type you want.When converting a Python value to a Hail expression with
literal()
, if you don’t wish to rely on the inferred type.With functions like
missing()
andempty_array()
.
Viewing an object’s type
Hail objects have a dtype
field that will print their type.
>>> hl.rand_norm().dtype
dtype('float64')
Printing the representation of a Hail expression will also show the type:
>>> hl.rand_norm()
<Float64Expression of type float64>
We can see that hl.rand_norm()
is of type tfloat64
, but what does
Expression mean?
Each data type in Hail is represented by its own Expression class. Data of
type tfloat64
is represented by an Float64Expression
. Data
of type tstruct
is represented by a StructExpression
.
Collection Types
Hail’s collection types (arrays, ndarrays, sets, and dicts) have homogenous elements,
meaning that all values in the collection must be of the same type. Python allows mixed
collections: ['1', 2, 3.0]
is a valid Python list. However, Hail arrays
cannot contain both tstr
and tint32
values. Likewise,
the dict
{'a': 1, 2: 'b'}
is a valid Python
dictionary, but a Hail dictionary cannot contain keys of different types.
An example of a valid dictionary in Hail is {'a': 1, 'b': 2}
, where the keys are all
strings and the values are all integers. The type of this dictionary would be
dict<str, int32>
.
Constructing types
Constructing types can be done either by using the type objects and classes
(prefixed by “t”) or by parsing from strings with dtype()
. As an example,
we will construct a tstruct
with each option:
>>> t = hl.tstruct(a = hl.tint32, b = hl.tstr, c = hl.tarray(hl.tfloat64))
>>> t
dtype('struct{a: int32, b: str, c: array<float64>}')
>>> t = hl.dtype('struct{a: int32, b: str, c: array<float64>}')
>>> t
dtype('struct{a: int32, b: str, c: array<float64>}')
Reference documentation
- hail.expr.types.dtype(type_str)[source]
Parse a type from its string representation.
Examples
>>> hl.dtype('int') dtype('int32')
>>> hl.dtype('float') dtype('float64')
>>> hl.dtype('array<int32>') dtype('array<int32>')
>>> hl.dtype('dict<str, bool>') dtype('dict<str, bool>')
>>> hl.dtype('struct{a: int32, `field with spaces`: int64}') dtype('struct{a: int32, `field with spaces`: int64}')
Notes
This function is able to reverse
str(t)
on aHailType
.The grammar is defined as follows:
type = _ ( array / bool / call / dict / interval / int64 / int32 / float32 / float64 / locus / ndarray / rng_state / set / stream / struct / str / tuple / union / void / variable ) _ variable = "?" simple_identifier (":" simple_identifier)? void = "void" / "tvoid" int64 = "int64" / "tint64" int32 = "int32" / "tint32" / "int" / "tint" float32 = "float32" / "tfloat32" float64 = "float64" / "tfloat64" / "tfloat" / "float" bool = "tbool" / "bool" call = "tcall" / "call" str = "tstr" / "str" locus = ("tlocus" / "locus") _ "<" identifier ">" array = ("tarray" / "array") _ "<" type ">" ndarray = ("tndarray" / "ndarray") _ "<" type "," nat ">" set = ("tset" / "set") _ "<" type ">" stream = ("tstream" / "stream") _ "<" type ">" dict = ("tdict" / "dict") _ "<" type "," type ">" struct = ("tstruct" / "struct") _ "{" (fields / _) "}" union = ("tunion" / "union") _ "{" (fields / _) "}" tuple = ("ttuple" / "tuple") _ "(" ((type ("," type)*) / _) ")" fields = field ("," field)* field = identifier ":" type interval = ("tinterval" / "interval") _ "<" type ">" identifier = _ (simple_identifier / escaped_identifier) _ simple_identifier = ~r"\w+" escaped_identifier = ~"`([^`\\\\]|\\\\.)*`" nat = _ (nat_literal / nat_variable) _ nat_literal = ~"[0-9]+" nat_variable = "?nat" rng_state = "rng_state" _ = ~r"\s*"
- hail.expr.types.tint32 = dtype('int32')
Hail type for signed 32-bit integers.
Their values can range from \(-2^{31}\) to \(2^{31} - 1\) (approximately 2.15 billion).
In Python, these are represented as
int
.See also
- hail.expr.types.tint64 = dtype('int64')
Hail type for signed 64-bit integers.
Their values can range from \(-2^{63}\) to \(2^{63} - 1\).
In Python, these are represented as
int
.See also
- hail.expr.types.tfloat32 = dtype('float32')
Hail type for 32-bit floating point numbers.
In Python, these are represented as
float
.See also
- hail.expr.types.tfloat64 = dtype('float64')
Hail type for 64-bit floating point numbers.
In Python, these are represented as
float
.See also
- hail.expr.types.tstr = dtype('str')
Hail type for text strings.
In Python, these are represented as strings.
See also
- hail.expr.types.tbool = dtype('bool')
Hail type for Boolean (
True
orFalse
) values.In Python, these are represented as
bool
.See also
- class hail.expr.types.tarray(element_type)[source]
Hail type for variable-length arrays of elements.
In Python, these are represented as
list
.Notes
Arrays contain elements of only one type, which is parameterized by element_type.
- Parameters:
element_type (
HailType
) – Element type of array.
- class hail.expr.types.tndarray(element_type, ndim)[source]
Hail type for n-dimensional arrays.
Danger
This functionality is experimental. It may not be tested as well as other parts of Hail and the interface is subject to change.
In Python, these are represented as NumPy
numpy.ndarray
.Notes
NDArrays contain elements of only one type, which is parameterized by element_type.
- Parameters:
element_type (
HailType
) – Element type of array.ndim (int32) – Number of dimensions.
See also
- class hail.expr.types.tset(element_type)[source]
Hail type for collections of distinct elements.
In Python, these are represented as
set
.Notes
Sets contain elements of only one type, which is parameterized by element_type.
- Parameters:
element_type (
HailType
) – Element type of set.
See also
SetExpression
,CollectionExpression
,set()
, Collection functions
- class hail.expr.types.tdict(key_type, value_type)[source]
Hail type for key-value maps.
In Python, these are represented as
dict
.Notes
Dicts parameterize the type of both their keys and values with key_type and value_type.
See also
- class hail.expr.types.tstruct(**field_types)[source]
Hail type for structured groups of heterogeneous fields.
In Python, these are represented as
Struct
.Hail’s
tstruct
type is commonly used to compose types together to form nested structures. Structs can contain any combination of types, and are ordered mappings from field name to field type. Each field name must be unique.Structs are very common in Hail. Each component of a
Table
andMatrixTable
is a struct:Structs appear below the top-level component types as well. Consider the following join:
>>> new_table = table1.annotate(table2_fields = table2.index(table1.key))
This snippet adds a field to
table1
calledtable2_fields
. In the new table,table2_fields
will be a struct containing all the non-key fields fromtable2
.- Parameters:
field_types (keyword args of
HailType
) – Fields.
See also
- class hail.expr.types.ttuple(*types)[source]
Hail type for tuples.
In Python, these are represented as
tuple
.- Parameters:
types (varargs of
HailType
) – Element types.
See also
- hail.expr.types.tcall = dtype('call')
Hail type for a diploid genotype.
In Python, these are represented by
Call
.See also
CallExpression
,Call
,call()
,parse_call()
,unphased_diploid_gt_index_call()
- class hail.expr.types.tlocus(reference_genome='default')[source]
Hail type for a genomic coordinate with a contig and a position.
In Python, these are represented by
Locus
.- Parameters:
reference_genome (
ReferenceGenome
orstr
) – Reference genome to use.
See also
- reference_genome
Reference genome.
- Returns:
ReferenceGenome
– Reference genome.
- class hail.expr.types.tinterval(point_type)[source]
Hail type for intervals of ordered values.
In Python, these are represented by
Interval
.- Parameters:
point_type (
HailType
) – Interval point type.
See also
IntervalExpression
,Interval
,interval()
,parse_locus_interval()