StringExpression

class hail.expr.StringExpression[source]

Bases: hail.expr.expressions.base_expression.Expression

Expression of type tstr.

>>> s = hl.literal('The quick brown fox')

Attributes

dtype The data type of the expression.

Methods

__init__ Initialize self.
collect Collect all records of an expression into a local list.
contains Returns whether substr is contained in the string.
describe Print information about type, index, and dependencies.
endswith Returns whether substr is a suffix of the string.
export Export a field to a text file.
first_match_in Returns an array containing the capture groups of the first match of regex in the given character sequence.
length Returns the length of the string.
lower Returns a copy of the string, but with upper case letters converted to lower case.
matches Returns True if the string contains any match for the given regex.
replace Replace substrings matching pattern1 with pattern2 using regex.
reverse Returns the reversed value.
show Print the first few rows of the table to the console.
split Returns an array of strings generated by splitting the string at delim.
startswith Returns whether substr is a prefix of the string.
strip Returns a copy of the string with whitespace removed from the start and end.
summarize Compute and print summary information about the expression.
take Collect the first n records of an expression.
translate Translates characters of the string using mapping.
upper Returns a copy of the string, but with lower case letters converted to upper case.
__add__(other)[source]

Concatenate strings.

Examples

>>> hl.eval(s + ' jumped over the lazy dog')
'The quick brown fox jumped over the lazy dog'
Parameters:other (StringExpression) – String to concatenate.
Returns:StringExpression – Concatenated string.
__eq__(other)

Returns True if the two expressions are equal.

Examples

>>> x = hl.literal(5)
>>> y = hl.literal(5)
>>> z = hl.literal(1)
>>> hl.eval(x == y)
True
>>> hl.eval(x == z)
False

Notes

This method will fail with an error if the two expressions are not of comparable types.

Parameters:other (Expression) – Expression for equality comparison.
Returns:BooleanExpressionTrue if the two expressions are equal.
__ge__(other)

Return self>=value.

__getitem__(item)[source]

Slice or index into the string.

Examples

>>> hl.eval(s[:15])
'The quick brown'
>>> hl.eval(s[0])
'T'
Parameters:item (slice or Expression of type tint32) – Slice or character index.
Returns:StringExpression – Substring or character at index item.
__gt__(other)

Return self>value.

__le__(other)

Return self<=value.

__lt__(other)

Return self<value.

__ne__(other)

Returns True if the two expressions are not equal.

Examples

>>> x = hl.literal(5)
>>> y = hl.literal(5)
>>> z = hl.literal(1)
>>> hl.eval(x != y)
False
>>> hl.eval(x != z)
True

Notes

This method will fail with an error if the two expressions are not of comparable types.

Parameters:other (Expression) – Expression for inequality comparison.
Returns:BooleanExpressionTrue if the two expressions are not equal.
collect(_localize=True)

Collect all records of an expression into a local list.

Examples

Collect all the values from C1:

>>> table1.C1.collect()
[2, 2, 10, 11]

Warning

Extremely experimental.

Warning

The list of records may be very large.

Returns:list
contains(substr)[source]

Returns whether substr is contained in the string.

Examples

>>> hl.eval(s.contains('fox'))
True
>>> hl.eval(s.contains('dog'))
False

Note

This method is case-sensitive.

Parameters:substr (StringExpression)
Returns:BooleanExpression
describe(handler=<built-in function print>)

Print information about type, index, and dependencies.

dtype

The data type of the expression.

Returns:HailType
endswith(substr)[source]

Returns whether substr is a suffix of the string.

Examples

>>> hl.eval(s.endswith('fox'))
True

Note

This method is case-sensitive.

Parameters:substr (StringExpression)
Returns:StringExpression
export(path, delimiter='\t', missing='NA', header=True)

Export a field to a text file.

Examples

>>> small_mt.GT.export('output/gt.tsv')
>>> with open('output/gt.tsv', 'r') as f:
...     for line in f:
...         print(line, end='')
locus   alleles 0       1       2       3
1:1     ["A","C"]       0/1     0/1     0/0     0/0
1:2     ["A","C"]       1/1     0/1     1/1     1/1
1:3     ["A","C"]       1/1     0/1     0/1     0/0
1:4     ["A","C"]       1/1     0/1     1/1     1/1
>>> small_mt.GT.export('output/gt-no-header.tsv', header=False)
>>> with open('output/gt-no-header.tsv', 'r') as f:
...     for line in f:
...         print(line, end='')
1:1     ["A","C"]       0/1     0/1     0/0     0/0
1:2     ["A","C"]       1/1     0/1     1/1     1/1
1:3     ["A","C"]       1/1     0/1     0/1     0/0
1:4     ["A","C"]       1/1     0/1     1/1     1/1
>>> small_mt.pop.export('output/pops.tsv')
>>> with open('output/pops.tsv', 'r') as f:
...     for line in f:
...         print(line, end='')
sample_idx      pop
0       2
1       2
2       0
3       2
>>> small_mt.ancestral_af.export('output/ancestral_af.tsv')
>>> with open('output/ancestral_af.tsv', 'r') as f:
...     for line in f:
...         print(line, end='')
locus   alleles ancestral_af
1:1     ["A","C"]       5.3905e-01
1:2     ["A","C"]       8.6768e-01
1:3     ["A","C"]       4.3765e-01
1:4     ["A","C"]       7.6300e-01
>>> mt = small_mt
>>> small_mt.bn.export('output/bn.tsv')
>>> with open('output/bn.tsv', 'r') as f:
...     for line in f:
...         print(line, end='')
bn
{"n_populations":3,"n_samples":4,"n_variants":4,"n_partitions":8,"pop_dist":[1,1,1],"fst":[0.1,0.1,0.1],"mixture":false}

Notes

For entry-indexed expressions, if there is one column key field, the result of calling hl.str() on that field is used as the column header. Otherwise, each compound column key is converted to JSON and used as a column header. For example:

>>> small_mt = small_mt.key_cols_by(s=small_mt.sample_idx, family='fam1')
>>> small_mt.GT.export('output/gt-no-header.tsv')
>>> with open('output/gt-no-header.tsv', 'r') as f:
...     for line in f:
...         print(line, end='')
locus   alleles {"s":0,"family":"fam1"} {"s":1,"family":"fam1"} {"s":2,"family":"fam1"} {"s":3,"family":"fam1"}
1:1     ["A","C"]       0/1     0/1     0/0     0/0
1:2     ["A","C"]       1/1     0/1     1/1     1/1
1:3     ["A","C"]       1/1     0/1     0/1     0/0
1:4     ["A","C"]       1/1     0/1     1/1     1/1
Parameters:
  • path (str) – The path to which to export.
  • delimiter (str) – The string for delimiting columns.
  • missing (str) – The string to output for missing values.
  • header (bool) – When True include a header line.
first_match_in(regex)[source]

Returns an array containing the capture groups of the first match of regex in the given character sequence.

Examples

>>> hl.eval(s.first_match_in("The quick (\w+) fox"))
['brown']
>>> hl.eval(s.first_match_in("The (\w+) (\w+) (\w+)"))
['quick', 'brown', 'fox']
>>> hl.eval(s.first_match_in("(\w+) (\w+)"))
['The', 'quick']
Parameters:regex (StringExpression)
Returns:ArrayExpression with element type tstr
length()[source]

Returns the length of the string.

Examples

>>> hl.eval(s.length())
19
Returns:Expression of type tint32 – Length of the string.
lower()[source]

Returns a copy of the string, but with upper case letters converted to lower case.

Examples

>>> hl.eval(s.lower())
'the quick brown fox'
Returns:StringExpression
matches(regex)[source]

Returns True if the string contains any match for the given regex.

Examples

>>> string = hl.literal('NA12878')

The regex parameter does not need to match the entire string:

>>> hl.eval(string.matches('12'))
True

Regex motifs can be used to match sequences of characters:

>>> hl.eval(string.matches(r'NA\d+'))
True

Notes

The regex argument is a regular expression, and uses Java regex syntax.

Parameters:regex (str) – Pattern to match.
Returns:BooleanExpressionTrue if the string contains any match for the regex, otherwise False.
replace(pattern1, pattern2)[source]

Replace substrings matching pattern1 with pattern2 using regex.

Examples

Replace spaces with underscores in a Hail string:

>>> hl.eval(hl.str("The quick  brown fox").replace(' ', '_'))
'The_quick__brown_fox'

Remove the leading zero in contigs in variant strings in a table:

>>> t = hl.import_table('data/leading-zero-variants.txt')
>>> t.show()
+----------------+
| variant        |
+----------------+
| str            |
+----------------+
| "01:1000:A:T"  |
| "01:10001:T:G" |
| "02:99:A:C"    |
| "02:893:G:C"   |
| "22:100:A:T"   |
| "X:10:C:A"     |
+----------------+

>>> t = t.annotate(variant = t.variant.replace("^0([0-9])", "$1"))
>>> t.show()
+---------------+
| variant       |
+---------------+
| str           |
+---------------+
| "1:1000:A:T"  |
| "1:10001:T:G" |
| "2:99:A:C"    |
| "2:893:G:C"   |
| "22:100:A:T"  |
| "X:10:C:A"    |
+---------------+

Notes

The regex expressions used should follow Java regex syntax. In the Java regular expression syntax, a dollar sign, $1, refers to the first group, not the canonical \1.

Parameters:
reverse()[source]

Returns the reversed value. .. rubric:: Examples

>>> string = hl.literal('ATGCC')
>>> hl.eval(string.reverse())
'CCGTA'
Returns:StringExpression
show(n=None, width=None, truncate=None, types=True, handler=None, n_rows=None, n_cols=None)

Print the first few rows of the table to the console.

Examples

>>> table1.SEX.show()
+-------+-----+
|    ID | SEX |
+-------+-----+
| int32 | str |
+-------+-----+
|     1 | "M" |
|     2 | "M" |
|     3 | "F" |
|     4 | "F" |
+-------+-----+
>>> hl.literal(123).show()
+--------+
| <expr> |
+--------+
|  int32 |
+--------+
|    123 |
+--------+

Warning

Extremely experimental.

Parameters:
  • n (int) – Maximum number of rows to show.
  • width (int) – Horizontal width at which to break columns.
  • truncate (int, optional) – Truncate each field to the given number of characters. If None, truncate fields to the given width.
  • types (bool) – Print an extra header line with the type of each field.
split(delim, n=None)[source]

Returns an array of strings generated by splitting the string at delim.

Examples

>>> hl.eval(s.split('\s+'))
['The', 'quick', 'brown', 'fox']
>>> hl.eval(s.split('\s+', 2))
['The', 'quick brown fox']

Notes

The delimiter is a regex using the Java regex syntax delimiter. To split on special characters, escape them with double backslash (\\).

Parameters:
Returns:

ArrayExpression – Array of split strings.

startswith(substr)[source]

Returns whether substr is a prefix of the string.

Examples

>>> hl.eval(s.startswith('The'))
True
>>> hl.eval(s.startswith('the'))
False

Note

This method is case-sensitive.

Parameters:substr (StringExpression)
Returns:StringExpression
strip()[source]

Returns a copy of the string with whitespace removed from the start and end.

Examples

>>> s2 = hl.str('  once upon a time\n')
>>> hl.eval(s2.strip())
'once upon a time'
Returns:StringExpression
summarize(handler=None)

Compute and print summary information about the expression.

Danger

This functionality is experimental. It may not be tested as well as other parts of Hail and the interface is subject to change.

take(n, _localize=True)

Collect the first n records of an expression.

Examples

Take the first three rows:

>>> table1.X.take(3)
[5, 6, 7]

Warning

Extremely experimental.

Parameters:n (int) – Number of records to take.
Returns:list
translate(mapping)[source]

Translates characters of the string using mapping.

Examples

>>> string = hl.literal('ATTTGCA')
>>> hl.eval(string.translate({'T': 'U'}))
'AUUUGCA'
Parameters:mapping (DictExpression) – Dictionary of character-character translations.
Returns:StringExpression

See also

replace()

upper()[source]

Returns a copy of the string, but with lower case letters converted to upper case.

Examples

>>> hl.eval(s.upper())
'THE QUICK BROWN FOX'
Returns:StringExpression