LocusExpression
- class hail.expr.LocusExpression[source]
Expression of type
tlocus
.>>> locus = hl.locus('1', 1034245)
Attributes
Returns the chromosome.
Returns the chromosome.
The data type of the expression.
Returns the position along the chromosome.
Methods
Returns a zero-indexed absolute position along the reference genome.
Returns
True
if the locus is on an autosome.Returns
True
if the locus is on an autosome or a pseudoautosomal region of chromosome X or Y.Returns
True
if the locus is on mitochondrial DNA.Returns
True
if the locus is in a non-pseudoautosomal region of chromosome X.Returns
True
if the locus is in a pseudoautosomal region of chromosome X.Returns
True
if the locus is in a non-pseudoautosomal region of chromosome Y.Returns
True
if the locus is in a pseudoautosomal region of chromosome Y.Return the reference genome sequence at the locus.
Returns an interval of a specified number of bases around the locus.
- __eq__(other)
Returns
True
if the two expressions are equal.Examples
>>> x = hl.literal(5) >>> y = hl.literal(5) >>> z = hl.literal(1)
>>> hl.eval(x == y) True
>>> hl.eval(x == z) False
Notes
This method will fail with an error if the two expressions are not of comparable types.
- Parameters:
other (
Expression
) – Expression for equality comparison.- Returns:
BooleanExpression
–True
if the two expressions are equal.
- __ge__(other)
Return self>=value.
- __gt__(other)
Return self>value.
- __le__(other)
Return self<=value.
- __lt__(other)
Return self<value.
- __ne__(other)
Returns
True
if the two expressions are not equal.Examples
>>> x = hl.literal(5) >>> y = hl.literal(5) >>> z = hl.literal(1)
>>> hl.eval(x != y) False
>>> hl.eval(x != z) True
Notes
This method will fail with an error if the two expressions are not of comparable types.
- Parameters:
other (
Expression
) – Expression for inequality comparison.- Returns:
BooleanExpression
–True
if the two expressions are not equal.
- collect(_localize=True)
Collect all records of an expression into a local list.
Examples
Collect all the values from C1:
>>> table1.C1.collect() [2, 2, 10, 11]
Warning
Extremely experimental.
Warning
The list of records may be very large.
- Returns:
- property contig
Returns the chromosome.
Examples
>>> hl.eval(locus.contig) '1'
- Returns:
StringExpression
– The chromosome for this locus.
- property contig_idx
Returns the chromosome.
Examples
>>> hl.eval(locus.contig_idx) 0
- Returns:
StringExpression
– The index of the chromosome for this locus.
- describe(handler=<built-in function print>)
Print information about type, index, and dependencies.
- export(path, delimiter='\t', missing='NA', header=True)
Export a field to a text file.
Examples
>>> small_mt.GT.export('output/gt.tsv') >>> with open('output/gt.tsv', 'r') as f: ... for line in f: ... print(line, end='') locus alleles 0 1 2 3 1:1 ["A","C"] 0/1 0/0 0/1 0/0 1:2 ["A","C"] 1/1 0/1 0/1 0/1 1:3 ["A","C"] 0/0 0/1 0/0 0/0 1:4 ["A","C"] 0/1 1/1 0/1 0/1
>>> small_mt.GT.export('output/gt-no-header.tsv', header=False) >>> with open('output/gt-no-header.tsv', 'r') as f: ... for line in f: ... print(line, end='') 1:1 ["A","C"] 0/1 0/0 0/1 0/0 1:2 ["A","C"] 1/1 0/1 0/1 0/1 1:3 ["A","C"] 0/0 0/1 0/0 0/0 1:4 ["A","C"] 0/1 1/1 0/1 0/1
>>> small_mt.pop.export('output/pops.tsv') >>> with open('output/pops.tsv', 'r') as f: ... for line in f: ... print(line, end='') sample_idx pop 0 1 1 2 2 2 3 2
>>> small_mt.ancestral_af.export('output/ancestral_af.tsv') >>> with open('output/ancestral_af.tsv', 'r') as f: ... for line in f: ... print(line, end='') locus alleles ancestral_af 1:1 ["A","C"] 3.8152e-01 1:2 ["A","C"] 7.0588e-01 1:3 ["A","C"] 4.9991e-01 1:4 ["A","C"] 3.9616e-01
>>> small_mt.bn.export('output/bn.tsv') >>> with open('output/bn.tsv', 'r') as f: ... for line in f: ... print(line, end='') bn {"n_populations":3,"n_samples":4,"n_variants":4,"n_partitions":4,"pop_dist":[1,1,1],"fst":[0.1,0.1,0.1],"mixture":false}
Notes
For entry-indexed expressions, if there is one column key field, the result of calling
str()
on that field is used as the column header. Otherwise, each compound column key is converted to JSON and used as a column header. For example:>>> small_mt = small_mt.key_cols_by(s=small_mt.sample_idx, family='fam1') >>> small_mt.GT.export('output/gt-no-header.tsv') >>> with open('output/gt-no-header.tsv', 'r') as f: ... for line in f: ... print(line, end='') locus alleles {"s":0,"family":"fam1"} {"s":1,"family":"fam1"} {"s":2,"family":"fam1"} {"s":3,"family":"fam1"} 1:1 ["A","C"] 0/1 0/0 0/1 0/0 1:2 ["A","C"] 1/1 0/1 0/1 0/1 1:3 ["A","C"] 0/0 0/1 0/0 0/0 1:4 ["A","C"] 0/1 1/1 0/1 0/1
- global_position()[source]
Returns a zero-indexed absolute position along the reference genome.
The global position is computed as
position
- 1 plus the sum of the lengths of all the contigs that precede this locus’scontig
in the reference genome’s ordering of contigs.See also
locus_from_global_position()
.Examples
A locus with position 1 along chromosome 1 will have a global position of 0 along the reference genome GRCh37.
>>> hl.eval(hl.locus('1', 1).global_position()) 0
A locus with position 1 along chromosome 2 will have a global position of (1-1) + 249250621, where 249250621 is the length of chromosome 1 on GRCh37.
>>> hl.eval(hl.locus('2', 1).global_position()) 249250621
A different reference genome than the default results in a different global position.
>>> hl.eval(hl.locus('chr2', 1, 'GRCh38').global_position()) 248956422
- Returns:
Expression
of typetint64
– Global base position of locus along the reference genome.
- in_autosome()[source]
Returns
True
if the locus is on an autosome.Notes
All contigs are considered autosomal except those designated as X, Y, or MT by
ReferenceGenome
.Examples
>>> hl.eval(locus.in_autosome()) True
- Returns:
- in_autosome_or_par()[source]
Returns
True
if the locus is on an autosome or a pseudoautosomal region of chromosome X or Y.Examples
>>> hl.eval(locus.in_autosome_or_par()) True
- Returns:
- in_mito()[source]
Returns
True
if the locus is on mitochondrial DNA.Examples
>>> hl.eval(locus.in_mito()) False
- Returns:
- in_x_nonpar()[source]
Returns
True
if the locus is in a non-pseudoautosomal region of chromosome X.Examples
>>> hl.eval(locus.in_x_nonpar()) False
- Returns:
- in_x_par()[source]
Returns
True
if the locus is in a pseudoautosomal region of chromosome X.Examples
>>> hl.eval(locus.in_x_par()) False
- Returns:
- in_y_nonpar()[source]
Returns
True
if the locus is in a non-pseudoautosomal region of chromosome Y.Examples
>>> hl.eval(locus.in_y_nonpar()) False
Note
Many variant callers only generate variants on chromosome X for the pseudoautosomal region. In this case, all loci mapped to chromosome Y are non-pseudoautosomal.
- Returns:
- in_y_par()[source]
Returns
True
if the locus is in a pseudoautosomal region of chromosome Y.Examples
>>> hl.eval(locus.in_y_par()) False
Note
Many variant callers only generate variants on chromosome X for the pseudoautosomal region. In this case, all loci mapped to chromosome Y are non-pseudoautosomal.
- Returns:
- property position
Returns the position along the chromosome.
Examples
>>> hl.eval(locus.position) 1034245
- Returns:
Expression
of typetint32
– This locus’s position along its chromosome.
- sequence_context(before=0, after=0)[source]
Return the reference genome sequence at the locus.
Examples
Get the reference allele at a locus:
>>> hl.eval(locus.sequence_context()) "G"
Get the reference sequence at a locus including the previous 5 bases:
>>> hl.eval(locus.sequence_context(before=5)) "ACTCGG"
Notes
This function requires that this locus’ reference genome has an attached reference sequence. Use
ReferenceGenome.add_sequence()
to load and attach a reference sequence to a reference genome.- Parameters:
before (
Expression
of typetint32
, optional) – Number of bases to include before the locus. Truncates at contig boundary.after (
Expression
of typetint32
, optional) – Number of bases to include after the locus. Truncates at contig boundary.
- Returns:
- show(n=None, width=None, truncate=None, types=True, handler=None, n_rows=None, n_cols=None)
Print the first few records of the expression to the console.
If the expression refers to a value on a keyed axis of a table or matrix table, then the accompanying keys will be shown along with the records.
Examples
>>> table1.SEX.show() +-------+-----+ | ID | SEX | +-------+-----+ | int32 | str | +-------+-----+ | 1 | "M" | | 2 | "M" | | 3 | "F" | | 4 | "F" | +-------+-----+
>>> hl.literal(123).show() +--------+ | <expr> | +--------+ | int32 | +--------+ | 123 | +--------+
Notes
The output can be passed piped to another output source using the handler argument:
>>> ht.foo.show(handler=lambda x: logging.info(x))
- Parameters:
- summarize(handler=None)
Compute and print summary information about the expression.
Danger
This functionality is experimental. It may not be tested as well as other parts of Hail and the interface is subject to change.
- take(n, _localize=True)
Collect the first n records of an expression.
Examples
Take the first three rows:
>>> table1.X.take(3) [5, 6, 7]
Warning
Extremely experimental.
- Parameters:
n (int) – Number of records to take.
- Returns:
- window(before, after)[source]
Returns an interval of a specified number of bases around the locus.
Examples
Create a window of two megabases centered at a locus:
>>> locus = hl.locus('16', 29_500_000) >>> window = locus.window(1_000_000, 1_000_000) >>> hl.eval(window) Interval(start=Locus(contig=16, position=28500000, reference_genome=GRCh37), end=Locus(contig=16, position=30500000, reference_genome=GRCh37), includes_start=True, includes_end=True)
Notes
The returned interval is inclusive of both the start and end endpoints.
- Parameters:
before (
Expression
of typetint32
) – Number of bases to include before the locus. Truncates at 1.after (
Expression
of typetint32
) – Number of bases to include after the locus. Truncates at contig length.
- Returns: