LocusExpression

class hail.expr.LocusExpression[source]

Expression of type tlocus.

>>> locus = hl.locus('1', 1034245)

Attributes

contig

Returns the chromosome.

contig_idx

Returns the chromosome.

dtype

The data type of the expression.

position

Returns the position along the chromosome.

Methods

global_position

Returns a zero-indexed absolute position along the reference genome.

in_autosome

Returns True if the locus is on an autosome.

in_autosome_or_par

Returns True if the locus is on an autosome or a pseudoautosomal region of chromosome X or Y.

in_mito

Returns True if the locus is on mitochondrial DNA.

in_x_nonpar

Returns True if the locus is in a non-pseudoautosomal region of chromosome X.

in_x_par

Returns True if the locus is in a pseudoautosomal region of chromosome X.

in_y_nonpar

Returns True if the locus is in a non-pseudoautosomal region of chromosome Y.

in_y_par

Returns True if the locus is in a pseudoautosomal region of chromosome Y.

sequence_context

Return the reference genome sequence at the locus.

window

Returns an interval of a specified number of bases around the locus.

__eq__(other)

Returns True if the two expressions are equal.

Examples

>>> x = hl.literal(5)
>>> y = hl.literal(5)
>>> z = hl.literal(1)
>>> hl.eval(x == y)
True
>>> hl.eval(x == z)
False

Notes

This method will fail with an error if the two expressions are not of comparable types.

Parameters:

other (Expression) – Expression for equality comparison.

Returns:

BooleanExpressionTrue if the two expressions are equal.

__ge__(other)

Return self>=value.

__gt__(other)

Return self>value.

__le__(other)

Return self<=value.

__lt__(other)

Return self<value.

__ne__(other)

Returns True if the two expressions are not equal.

Examples

>>> x = hl.literal(5)
>>> y = hl.literal(5)
>>> z = hl.literal(1)
>>> hl.eval(x != y)
False
>>> hl.eval(x != z)
True

Notes

This method will fail with an error if the two expressions are not of comparable types.

Parameters:

other (Expression) – Expression for inequality comparison.

Returns:

BooleanExpressionTrue if the two expressions are not equal.

collect(_localize=True)

Collect all records of an expression into a local list.

Examples

Collect all the values from C1:

>>> table1.C1.collect()
[2, 2, 10, 11]

Warning

Extremely experimental.

Warning

The list of records may be very large.

Returns:

list

property contig

Returns the chromosome.

Examples

>>> hl.eval(locus.contig)
'1'
Returns:

StringExpression – The chromosome for this locus.

property contig_idx

Returns the chromosome.

Examples

>>> hl.eval(locus.contig_idx)
0
Returns:

StringExpression – The index of the chromosome for this locus.

describe(handler=<built-in function print>)

Print information about type, index, and dependencies.

property dtype

The data type of the expression.

Returns:

HailType

export(path, delimiter='\t', missing='NA', header=True)

Export a field to a text file.

Examples

>>> small_mt.GT.export('output/gt.tsv')
>>> with open('output/gt.tsv', 'r') as f:
...     for line in f:
...         print(line, end='')
locus   alleles 0       1       2       3
1:1     ["A","C"]       0/1     0/0     0/1     0/0
1:2     ["A","C"]       1/1     0/1     0/1     0/1
1:3     ["A","C"]       0/0     0/1     0/0     0/0
1:4     ["A","C"]       0/1     1/1     0/1     0/1
>>> small_mt.GT.export('output/gt-no-header.tsv', header=False)
>>> with open('output/gt-no-header.tsv', 'r') as f:
...     for line in f:
...         print(line, end='')
1:1     ["A","C"]       0/1     0/0     0/1     0/0
1:2     ["A","C"]       1/1     0/1     0/1     0/1
1:3     ["A","C"]       0/0     0/1     0/0     0/0
1:4     ["A","C"]       0/1     1/1     0/1     0/1
>>> small_mt.pop.export('output/pops.tsv')
>>> with open('output/pops.tsv', 'r') as f:
...     for line in f:
...         print(line, end='')
sample_idx      pop
0       1
1       2
2       2
3       2
>>> small_mt.ancestral_af.export('output/ancestral_af.tsv')
>>> with open('output/ancestral_af.tsv', 'r') as f:
...     for line in f:
...         print(line, end='')
locus   alleles ancestral_af
1:1     ["A","C"]       3.8152e-01
1:2     ["A","C"]       7.0588e-01
1:3     ["A","C"]       4.9991e-01
1:4     ["A","C"]       3.9616e-01
>>> small_mt.bn.export('output/bn.tsv')
>>> with open('output/bn.tsv', 'r') as f:
...     for line in f:
...         print(line, end='')
bn
{"n_populations":3,"n_samples":4,"n_variants":4,"n_partitions":4,"pop_dist":[1,1,1],"fst":[0.1,0.1,0.1],"mixture":false}

Notes

For entry-indexed expressions, if there is one column key field, the result of calling str() on that field is used as the column header. Otherwise, each compound column key is converted to JSON and used as a column header. For example:

>>> small_mt = small_mt.key_cols_by(s=small_mt.sample_idx, family='fam1')
>>> small_mt.GT.export('output/gt-no-header.tsv')
>>> with open('output/gt-no-header.tsv', 'r') as f:
...     for line in f:
...         print(line, end='')
locus   alleles {"s":0,"family":"fam1"} {"s":1,"family":"fam1"} {"s":2,"family":"fam1"} {"s":3,"family":"fam1"}
1:1     ["A","C"]       0/1     0/0     0/1     0/0
1:2     ["A","C"]       1/1     0/1     0/1     0/1
1:3     ["A","C"]       0/0     0/1     0/0     0/0
1:4     ["A","C"]       0/1     1/1     0/1     0/1
Parameters:
  • path (str) – The path to which to export.

  • delimiter (str) – The string for delimiting columns.

  • missing (str) – The string to output for missing values.

  • header (bool) – When True include a header line.

global_position()[source]

Returns a zero-indexed absolute position along the reference genome.

The global position is computed as position - 1 plus the sum of the lengths of all the contigs that precede this locus’s contig in the reference genome’s ordering of contigs.

See also locus_from_global_position().

Examples

A locus with position 1 along chromosome 1 will have a global position of 0 along the reference genome GRCh37.

>>> hl.eval(hl.locus('1', 1).global_position())
0

A locus with position 1 along chromosome 2 will have a global position of (1-1) + 249250621, where 249250621 is the length of chromosome 1 on GRCh37.

>>> hl.eval(hl.locus('2', 1).global_position())
249250621

A different reference genome than the default results in a different global position.

>>> hl.eval(hl.locus('chr2', 1, 'GRCh38').global_position())
248956422
Returns:

Expression of type tint64 – Global base position of locus along the reference genome.

in_autosome()[source]

Returns True if the locus is on an autosome.

Notes

All contigs are considered autosomal except those designated as X, Y, or MT by ReferenceGenome.

Examples

>>> hl.eval(locus.in_autosome())
True
Returns:

BooleanExpression

in_autosome_or_par()[source]

Returns True if the locus is on an autosome or a pseudoautosomal region of chromosome X or Y.

Examples

>>> hl.eval(locus.in_autosome_or_par())
True
Returns:

BooleanExpression

in_mito()[source]

Returns True if the locus is on mitochondrial DNA.

Examples

>>> hl.eval(locus.in_mito())
False
Returns:

BooleanExpression

in_x_nonpar()[source]

Returns True if the locus is in a non-pseudoautosomal region of chromosome X.

Examples

>>> hl.eval(locus.in_x_nonpar())
False
Returns:

BooleanExpression

in_x_par()[source]

Returns True if the locus is in a pseudoautosomal region of chromosome X.

Examples

>>> hl.eval(locus.in_x_par())
False
Returns:

BooleanExpression

in_y_nonpar()[source]

Returns True if the locus is in a non-pseudoautosomal region of chromosome Y.

Examples

>>> hl.eval(locus.in_y_nonpar())
False

Note

Many variant callers only generate variants on chromosome X for the pseudoautosomal region. In this case, all loci mapped to chromosome Y are non-pseudoautosomal.

Returns:

BooleanExpression

in_y_par()[source]

Returns True if the locus is in a pseudoautosomal region of chromosome Y.

Examples

>>> hl.eval(locus.in_y_par())
False

Note

Many variant callers only generate variants on chromosome X for the pseudoautosomal region. In this case, all loci mapped to chromosome Y are non-pseudoautosomal.

Returns:

BooleanExpression

property position

Returns the position along the chromosome.

Examples

>>> hl.eval(locus.position)
1034245
Returns:

Expression of type tint32 – This locus’s position along its chromosome.

sequence_context(before=0, after=0)[source]

Return the reference genome sequence at the locus.

Examples

Get the reference allele at a locus:

>>> hl.eval(locus.sequence_context()) 
"G"

Get the reference sequence at a locus including the previous 5 bases:

>>> hl.eval(locus.sequence_context(before=5)) 
"ACTCGG"

Notes

This function requires that this locus’ reference genome has an attached reference sequence. Use ReferenceGenome.add_sequence() to load and attach a reference sequence to a reference genome.

Parameters:
  • before (Expression of type tint32, optional) – Number of bases to include before the locus. Truncates at contig boundary.

  • after (Expression of type tint32, optional) – Number of bases to include after the locus. Truncates at contig boundary.

Returns:

StringExpression

show(n=None, width=None, truncate=None, types=True, handler=None, n_rows=None, n_cols=None)

Print the first few records of the expression to the console.

If the expression refers to a value on a keyed axis of a table or matrix table, then the accompanying keys will be shown along with the records.

Examples

>>> table1.SEX.show()
+-------+-----+
|    ID | SEX |
+-------+-----+
| int32 | str |
+-------+-----+
|     1 | "M" |
|     2 | "M" |
|     3 | "F" |
|     4 | "F" |
+-------+-----+
>>> hl.literal(123).show()
+--------+
| <expr> |
+--------+
|  int32 |
+--------+
|    123 |
+--------+

Notes

The output can be passed piped to another output source using the handler argument:

>>> ht.foo.show(handler=lambda x: logging.info(x))  
Parameters:
  • n (int) – Maximum number of rows to show.

  • width (int) – Horizontal width at which to break columns.

  • truncate (int, optional) – Truncate each field to the given number of characters. If None, truncate fields to the given width.

  • types (bool) – Print an extra header line with the type of each field.

summarize(handler=None)

Compute and print summary information about the expression.

Danger

This functionality is experimental. It may not be tested as well as other parts of Hail and the interface is subject to change.

take(n, _localize=True)

Collect the first n records of an expression.

Examples

Take the first three rows:

>>> table1.X.take(3)
[5, 6, 7]

Warning

Extremely experimental.

Parameters:

n (int) – Number of records to take.

Returns:

list

window(before, after)[source]

Returns an interval of a specified number of bases around the locus.

Examples

Create a window of two megabases centered at a locus:

>>> locus = hl.locus('16', 29_500_000)
>>> window = locus.window(1_000_000, 1_000_000)
>>> hl.eval(window)
Interval(start=Locus(contig=16, position=28500000, reference_genome=GRCh37), end=Locus(contig=16, position=30500000, reference_genome=GRCh37), includes_start=True, includes_end=True)

Notes

The returned interval is inclusive of both the start and end endpoints.

Parameters:
  • before (Expression of type tint32) – Number of bases to include before the locus. Truncates at 1.

  • after (Expression of type tint32) – Number of bases to include after the locus. Truncates at contig length.

Returns:

IntervalExpression