StringExpression

class hail.expr.StringExpression[source]

Expression of type tstr.

>>> s = hl.literal('The quick brown fox')

Attributes

dtype

The data type of the expression.

Methods

contains

Returns whether substr is contained in the string.

endswith

Returns whether substr is a suffix of the string.

first_match_in

Returns an array containing the capture groups of the first match of regex in the given character sequence.

join

Returns a string which is the concatenation of the strings in collection separated by the string providing this method.

length

Returns the length of the string.

lower

Returns a copy of the string, but with upper case letters converted to lower case.

matches

Returns True if the string contains any match for the given regex.

replace

Replace substrings matching pattern1 with pattern2 using regex.

reverse

Returns the reversed value.

split

Returns an array of strings generated by splitting the string at delim.

startswith

Returns whether substr is a prefix of the string.

strip

Returns a copy of the string with whitespace removed from the start and end.

translate

Translates characters of the string using mapping.

upper

Returns a copy of the string, but with lower case letters converted to upper case.

__add__(other)[source]

Concatenate strings.

Examples

>>> hl.eval(s + ' jumped over the lazy dog')
'The quick brown fox jumped over the lazy dog'
Parameters

other (StringExpression) – String to concatenate.

Returns

StringExpression – Concatenated string.

__eq__(other)

Returns True if the two expressions are equal.

Examples

>>> x = hl.literal(5)
>>> y = hl.literal(5)
>>> z = hl.literal(1)
>>> hl.eval(x == y)
True
>>> hl.eval(x == z)
False

Notes

This method will fail with an error if the two expressions are not of comparable types.

Parameters

other (Expression) – Expression for equality comparison.

Returns

BooleanExpressionTrue if the two expressions are equal.

__ge__(other)

Return self>=value.

__getitem__(item)[source]

Slice or index into the string.

Examples

>>> hl.eval(s[:15])
'The quick brown'
>>> hl.eval(s[0])
'T'
Parameters

item (slice or Expression of type tint32) – Slice or character index.

Returns

StringExpression – Substring or character at index item.

__gt__(other)

Return self>value.

__le__(other)

Return self<=value.

__lt__(other)

Return self<value.

__ne__(other)

Returns True if the two expressions are not equal.

Examples

>>> x = hl.literal(5)
>>> y = hl.literal(5)
>>> z = hl.literal(1)
>>> hl.eval(x != y)
False
>>> hl.eval(x != z)
True

Notes

This method will fail with an error if the two expressions are not of comparable types.

Parameters

other (Expression) – Expression for inequality comparison.

Returns

BooleanExpressionTrue if the two expressions are not equal.

collect(_localize=True)

Collect all records of an expression into a local list.

Examples

Collect all the values from C1:

>>> table1.C1.collect()
[2, 2, 10, 11]

Warning

Extremely experimental.

Warning

The list of records may be very large.

Returns

list

contains(substr)[source]

Returns whether substr is contained in the string.

Examples

>>> hl.eval(s.contains('fox'))
True
>>> hl.eval(s.contains('dog'))
False

Note

This method is case-sensitive.

Parameters

substr (StringExpression)

Returns

BooleanExpression

describe(handler=<built-in function print>)

Print information about type, index, and dependencies.

property dtype

The data type of the expression.

Returns

HailType

endswith(substr)[source]

Returns whether substr is a suffix of the string.

Examples

>>> hl.eval(s.endswith('fox'))
True

Note

This method is case-sensitive.

Parameters

substr (StringExpression)

Returns

StringExpression

export(path, delimiter='\t', missing='NA', header=True)

Export a field to a text file.

Examples

>>> small_mt.GT.export('output/gt.tsv')
>>> with open('output/gt.tsv', 'r') as f:
...     for line in f:
...         print(line, end='')
locus   alleles 0       1       2       3
1:1     ["A","C"]       0/1     0/1     0/0     0/0
1:2     ["A","C"]       1/1     0/1     1/1     1/1
1:3     ["A","C"]       1/1     0/1     0/1     0/0
1:4     ["A","C"]       1/1     0/1     1/1     1/1
>>> small_mt.GT.export('output/gt-no-header.tsv', header=False)
>>> with open('output/gt-no-header.tsv', 'r') as f:
...     for line in f:
...         print(line, end='')
1:1     ["A","C"]       0/1     0/1     0/0     0/0
1:2     ["A","C"]       1/1     0/1     1/1     1/1
1:3     ["A","C"]       1/1     0/1     0/1     0/0
1:4     ["A","C"]       1/1     0/1     1/1     1/1
>>> small_mt.pop.export('output/pops.tsv')
>>> with open('output/pops.tsv', 'r') as f:
...     for line in f:
...         print(line, end='')
sample_idx      pop
0       2
1       2
2       0
3       2
>>> small_mt.ancestral_af.export('output/ancestral_af.tsv')
>>> with open('output/ancestral_af.tsv', 'r') as f:
...     for line in f:
...         print(line, end='')
locus   alleles ancestral_af
1:1     ["A","C"]       5.3905e-01
1:2     ["A","C"]       8.6768e-01
1:3     ["A","C"]       4.3765e-01
1:4     ["A","C"]       7.6300e-01
>>> mt = small_mt
>>> small_mt.bn.export('output/bn.tsv')
>>> with open('output/bn.tsv', 'r') as f:
...     for line in f:
...         print(line, end='')
bn
{"n_populations":3,"n_samples":4,"n_variants":4,"n_partitions":8,"pop_dist":[1,1,1],"fst":[0.1,0.1,0.1],"mixture":false}

Notes

For entry-indexed expressions, if there is one column key field, the result of calling str() on that field is used as the column header. Otherwise, each compound column key is converted to JSON and used as a column header. For example:

>>> small_mt = small_mt.key_cols_by(s=small_mt.sample_idx, family='fam1')
>>> small_mt.GT.export('output/gt-no-header.tsv')
>>> with open('output/gt-no-header.tsv', 'r') as f:
...     for line in f:
...         print(line, end='')
locus   alleles {"s":0,"family":"fam1"} {"s":1,"family":"fam1"} {"s":2,"family":"fam1"} {"s":3,"family":"fam1"}
1:1     ["A","C"]       0/1     0/1     0/0     0/0
1:2     ["A","C"]       1/1     0/1     1/1     1/1
1:3     ["A","C"]       1/1     0/1     0/1     0/0
1:4     ["A","C"]       1/1     0/1     1/1     1/1
Parameters
  • path (str) – The path to which to export.

  • delimiter (str) – The string for delimiting columns.

  • missing (str) – The string to output for missing values.

  • header (bool) – When True include a header line.

first_match_in(regex)[source]

Returns an array containing the capture groups of the first match of regex in the given character sequence.

Examples

>>> hl.eval(s.first_match_in("The quick (\w+) fox"))
['brown']
>>> hl.eval(s.first_match_in("The (\w+) (\w+) (\w+)"))
['quick', 'brown', 'fox']
>>> hl.eval(s.first_match_in("(\w+) (\w+)"))
['The', 'quick']
Parameters

regex (StringExpression)

Returns

ArrayExpression with element type tstr

join(collection)[source]

Returns a string which is the concatenation of the strings in collection separated by the string providing this method. Raises TypeError if the element type of collection is not tstr.

Examples

>>> a = ['Bob', 'Charlie', 'Alice', 'Bob', 'Bob']
>>> hl.eval(hl.str(',').join(a))
'Bob,Charlie,Alice,Bob,Bob'
Parameters

collection (ArrayExpression or SetExpression) – Collection.

Returns

StringExpression – Joined string expression.

length()[source]

Returns the length of the string.

Examples

>>> hl.eval(s.length())
19
Returns

Expression of type tint32 – Length of the string.

lower()[source]

Returns a copy of the string, but with upper case letters converted to lower case.

Examples

>>> hl.eval(s.lower())
'the quick brown fox'
Returns

StringExpression

matches(regex)[source]

Returns True if the string contains any match for the given regex.

Examples

>>> string = hl.literal('NA12878')

The regex parameter does not need to match the entire string:

>>> hl.eval(string.matches('12'))
True

Regex motifs can be used to match sequences of characters:

>>> hl.eval(string.matches(r'NA\d+'))
True

Notes

The regex argument is a regular expression, and uses Java regex syntax.

Parameters

regex (StringExpression) – Pattern to match.

Returns

BooleanExpressionTrue if the string contains any match for the regex, otherwise False.

replace(pattern1, pattern2)[source]

Replace substrings matching pattern1 with pattern2 using regex.

Examples

Replace spaces with underscores in a Hail string:

>>> hl.eval(hl.str("The quick  brown fox").replace(' ', '_'))
'The_quick__brown_fox'

Remove the leading zero in contigs in variant strings in a table:

>>> t = hl.import_table('data/leading-zero-variants.txt')
>>> t.show()
+----------------+
| variant        |
+----------------+
| str            |
+----------------+
| "01:1000:A:T"  |
| "01:10001:T:G" |
| "02:99:A:C"    |
| "02:893:G:C"   |
| "22:100:A:T"   |
| "X:10:C:A"     |
+----------------+

>>> t = t.annotate(variant = t.variant.replace("^0([0-9])", "$1"))
>>> t.show()
+---------------+
| variant       |
+---------------+
| str           |
+---------------+
| "1:1000:A:T"  |
| "1:10001:T:G" |
| "2:99:A:C"    |
| "2:893:G:C"   |
| "22:100:A:T"  |
| "X:10:C:A"    |
+---------------+

Notes

The regex expressions used should follow Java regex syntax. In the Java regular expression syntax, a dollar sign, $1, refers to the first group, not the canonical \1.

Parameters
reverse()[source]

Returns the reversed value. .. rubric:: Examples

>>> string = hl.literal('ATGCC')
>>> hl.eval(string.reverse())
'CCGTA'
Returns

StringExpression

show(n=None, width=None, truncate=None, types=True, handler=None, n_rows=None, n_cols=None)

Print the first few records of the expression to the console.

If the expression refers to a value on a keyed axis of a table or matrix table, then the accompanying keys will be shown along with the records.

Examples

>>> table1.SEX.show()
+-------+-----+
|    ID | SEX |
+-------+-----+
| int32 | str |
+-------+-----+
|     1 | "M" |
|     2 | "M" |
|     3 | "F" |
|     4 | "F" |
+-------+-----+
>>> hl.literal(123).show()
+--------+
| <expr> |
+--------+
|  int32 |
+--------+
|    123 |
+--------+

Notes

The output can be passed piped to another output source using the handler argument:

>>> ht.foo.show(handler=lambda x: logging.info(x))  
Parameters
  • n (int) – Maximum number of rows to show.

  • width (int) – Horizontal width at which to break columns.

  • truncate (int, optional) – Truncate each field to the given number of characters. If None, truncate fields to the given width.

  • types (bool) – Print an extra header line with the type of each field.

split(delim, n=None)[source]

Returns an array of strings generated by splitting the string at delim.

Examples

>>> hl.eval(s.split('\s+'))
['The', 'quick', 'brown', 'fox']
>>> hl.eval(s.split('\s+', 2))
['The', 'quick brown fox']

Notes

The delimiter is a regex using the Java regex syntax delimiter. To split on special characters, escape them with double backslash (\\).

Parameters
Returns

ArrayExpression – Array of split strings.

startswith(substr)[source]

Returns whether substr is a prefix of the string.

Examples

>>> hl.eval(s.startswith('The'))
True
>>> hl.eval(s.startswith('the'))
False

Note

This method is case-sensitive.

Parameters

substr (StringExpression)

Returns

StringExpression

strip()[source]

Returns a copy of the string with whitespace removed from the start and end.

Examples

>>> s2 = hl.str('  once upon a time\n')
>>> hl.eval(s2.strip())
'once upon a time'
Returns

StringExpression

summarize(handler=None)

Compute and print summary information about the expression.

Danger

This functionality is experimental. It may not be tested as well as other parts of Hail and the interface is subject to change.

take(n, _localize=True)

Collect the first n records of an expression.

Examples

Take the first three rows:

>>> table1.X.take(3)
[5, 6, 7]

Warning

Extremely experimental.

Parameters

n (int) – Number of records to take.

Returns

list

translate(mapping)[source]

Translates characters of the string using mapping.

Examples

>>> string = hl.literal('ATTTGCA')
>>> hl.eval(string.translate({'T': 'U'}))
'AUUUGCA'
Parameters

mapping (DictExpression) – Dictionary of character-character translations.

Returns

StringExpression

See also

replace()

upper()[source]

Returns a copy of the string, but with lower case letters converted to upper case.

Examples

>>> hl.eval(s.upper())
'THE QUICK BROWN FOX'
Returns

StringExpression