StringExpression
- class hail.expr.StringExpression[source]
Expression of type
tstr
.>>> s = hl.literal('The quick brown fox')
Attributes
The data type of the expression.
Methods
Returns whether substr is contained in the string.
Returns whether substr is a suffix of the string.
Return the lowest index in the string where substring sub is found within the slice s[start:end].
Returns an array containing the capture groups of the first match of regex in the given character sequence.
Returns a string which is the concatenation of the strings in collection separated by the string providing this method.
Returns the length of the string.
Returns a copy of the string, but with upper case letters converted to lower case.
Returns
True
if the string contains any match for the given regex if full_match is false.Replace substrings matching pattern1 with pattern2 using regex.
Returns the reversed value.
Returns an array of strings generated by splitting the string at delim.
Returns whether substr is a prefix of the string.
Returns a copy of the string with whitespace removed from the start and end.
Translates characters of the string using mapping.
Returns a copy of the string, but with lower case letters converted to upper case.
- __add__(other)[source]
Concatenate strings.
Examples
>>> hl.eval(s + ' jumped over the lazy dog') 'The quick brown fox jumped over the lazy dog'
- Parameters:
other (
StringExpression
) – String to concatenate.- Returns:
StringExpression
– Concatenated string.
- __eq__(other)
Returns
True
if the two expressions are equal.Examples
>>> x = hl.literal(5) >>> y = hl.literal(5) >>> z = hl.literal(1)
>>> hl.eval(x == y) True
>>> hl.eval(x == z) False
Notes
This method will fail with an error if the two expressions are not of comparable types.
- Parameters:
other (
Expression
) – Expression for equality comparison.- Returns:
BooleanExpression
–True
if the two expressions are equal.
- __ge__(other)
Return self>=value.
- __getitem__(item)[source]
Slice or index into the string.
Examples
>>> hl.eval(s[:15]) 'The quick brown'
>>> hl.eval(s[0]) 'T'
- Parameters:
item (slice or
Expression
of typetint32
) – Slice or character index.- Returns:
StringExpression
– Substring or character at index item.
- __gt__(other)
Return self>value.
- __le__(other)
Return self<=value.
- __lt__(other)
Return self<value.
- __ne__(other)
Returns
True
if the two expressions are not equal.Examples
>>> x = hl.literal(5) >>> y = hl.literal(5) >>> z = hl.literal(1)
>>> hl.eval(x != y) False
>>> hl.eval(x != z) True
Notes
This method will fail with an error if the two expressions are not of comparable types.
- Parameters:
other (
Expression
) – Expression for inequality comparison.- Returns:
BooleanExpression
–True
if the two expressions are not equal.
- collect(_localize=True)
Collect all records of an expression into a local list.
Examples
Collect all the values from C1:
>>> table1.C1.collect() [2, 2, 10, 11]
Warning
Extremely experimental.
Warning
The list of records may be very large.
- Returns:
- contains(substr)[source]
Returns whether substr is contained in the string.
Examples
>>> hl.eval(s.contains('fox')) True
>>> hl.eval(s.contains('dog')) False
Note
This method is case-sensitive.
- Parameters:
substr (
StringExpression
)- Returns:
- describe(handler=<built-in function print>)
Print information about type, index, and dependencies.
- endswith(substr)[source]
Returns whether substr is a suffix of the string.
Examples
>>> hl.eval(s.endswith('fox')) True
Note
This method is case-sensitive.
- Parameters:
substr (
StringExpression
)- Returns:
- export(path, delimiter='\t', missing='NA', header=True)
Export a field to a text file.
Examples
>>> small_mt.GT.export('output/gt.tsv') >>> with open('output/gt.tsv', 'r') as f: ... for line in f: ... print(line, end='') locus alleles 0 1 2 3 1:1 ["A","C"] 0/1 0/0 0/1 0/0 1:2 ["A","C"] 1/1 0/1 0/1 0/1 1:3 ["A","C"] 0/0 0/1 0/0 0/0 1:4 ["A","C"] 0/1 1/1 0/1 0/1
>>> small_mt.GT.export('output/gt-no-header.tsv', header=False) >>> with open('output/gt-no-header.tsv', 'r') as f: ... for line in f: ... print(line, end='') 1:1 ["A","C"] 0/1 0/0 0/1 0/0 1:2 ["A","C"] 1/1 0/1 0/1 0/1 1:3 ["A","C"] 0/0 0/1 0/0 0/0 1:4 ["A","C"] 0/1 1/1 0/1 0/1
>>> small_mt.pop.export('output/pops.tsv') >>> with open('output/pops.tsv', 'r') as f: ... for line in f: ... print(line, end='') sample_idx pop 0 1 1 2 2 2 3 2
>>> small_mt.ancestral_af.export('output/ancestral_af.tsv') >>> with open('output/ancestral_af.tsv', 'r') as f: ... for line in f: ... print(line, end='') locus alleles ancestral_af 1:1 ["A","C"] 3.8152e-01 1:2 ["A","C"] 7.0588e-01 1:3 ["A","C"] 4.9991e-01 1:4 ["A","C"] 3.9616e-01
>>> small_mt.bn.export('output/bn.tsv') >>> with open('output/bn.tsv', 'r') as f: ... for line in f: ... print(line, end='') bn {"n_populations":3,"n_samples":4,"n_variants":4,"n_partitions":4,"pop_dist":[1,1,1],"fst":[0.1,0.1,0.1],"mixture":false}
Notes
For entry-indexed expressions, if there is one column key field, the result of calling
str()
on that field is used as the column header. Otherwise, each compound column key is converted to JSON and used as a column header. For example:>>> small_mt = small_mt.key_cols_by(s=small_mt.sample_idx, family='fam1') >>> small_mt.GT.export('output/gt-no-header.tsv') >>> with open('output/gt-no-header.tsv', 'r') as f: ... for line in f: ... print(line, end='') locus alleles {"s":0,"family":"fam1"} {"s":1,"family":"fam1"} {"s":2,"family":"fam1"} {"s":3,"family":"fam1"} 1:1 ["A","C"] 0/1 0/0 0/1 0/0 1:2 ["A","C"] 1/1 0/1 0/1 0/1 1:3 ["A","C"] 0/0 0/1 0/0 0/0 1:4 ["A","C"] 0/1 1/1 0/1 0/1
- find(sub, start=None, end=None)[source]
Return the lowest index in the string where substring sub is found within the slice s[start:end]. Optional arguments start and end are interpreted as in slice notation. Evaluates to -1 if sub is not found.
Examples
>>> a = hl.str('hello, world') >>> hl.eval(a.find('world')) 7
>>> hl.eval(a.find('hail')) -1
- Parameters:
sub (
StringExpression
) – substring to findstart (
Int32Expression
) – optional slice start indexend (
Int32Expression
) – optional slice end index
- Returns:
Int32Expression
– lowest index in the string where substring sub is found or -1.
- first_match_in(regex)[source]
Returns an array containing the capture groups of the first match of regex in the given character sequence.
Examples
>>> hl.eval(s.first_match_in("The quick (\w+) fox")) ['brown']
>>> hl.eval(s.first_match_in("The (\w+) (\w+) (\w+)")) ['quick', 'brown', 'fox']
>>> hl.eval(s.first_match_in("(\w+) (\w+)")) ['The', 'quick']
- Parameters:
regex (
StringExpression
)- Returns:
ArrayExpression
with element typetstr
- join(collection)[source]
Returns a string which is the concatenation of the strings in collection separated by the string providing this method. Raises
TypeError
if the element type of collection is nottstr
.Examples
>>> a = ['Bob', 'Charlie', 'Alice', 'Bob', 'Bob']
>>> hl.eval(hl.str(',').join(a)) 'Bob,Charlie,Alice,Bob,Bob'
- Parameters:
collection (
ArrayExpression
orSetExpression
) – Collection.- Returns:
StringExpression
– Joined string expression.
- length()[source]
Returns the length of the string.
Examples
>>> hl.eval(s.length()) 19
- Returns:
Expression
of typetint32
– Length of the string.
- lower()[source]
Returns a copy of the string, but with upper case letters converted to lower case.
Examples
>>> hl.eval(s.lower()) 'the quick brown fox'
- Returns:
- matches(regex, full_match=False)[source]
Returns
True
if the string contains any match for the given regex if full_match is false. ReturnsTrue
if the whole string matches the given regex if full_match is true.Examples
The regex parameter does not need to match the entire string if full_match is
False
:>>> string = hl.literal('NA12878') >>> hl.eval(string.matches('12')) True
The regex parameter needs to match the entire string if full_match is
True
:>>> string = hl.literal('NA12878') >>> hl.eval(string.matches('12', True)) False
>>> string = hl.literal('3412878') >>> hl.eval(string.matches('^[0-9]*$')) True
Regex motifs can be used to match sequences of characters:
>>> string = hl.literal('NA12878') >>> hl.eval(string.matches(r'NA\d+')) True
>>> string = hl.literal('3412878') >>> hl.eval(string.matches('^[0-9]*$')) True
Notes
The regex argument is a regular expression, and uses Java regex syntax.
- Parameters:
regex (
StringExpression
) – Pattern to match.full_match (:obj: bool) – If
True
, the function considers whether the whole string matches the regex. IfFalse
, the function considers whether the string has a partial match for that regex
- Returns:
BooleanExpression
– If full_match isFalse
,``True`` if the string contains any match for the regex, otherwiseFalse
. If full_match isTrue
,``True`` if the whole string matches the regex, otherwiseFalse
.
- replace(pattern1, pattern2)[source]
Replace substrings matching pattern1 with pattern2 using regex.
Examples
Replace spaces with underscores in a Hail string:
>>> hl.eval(hl.str("The quick brown fox").replace(' ', '_')) 'The_quick__brown_fox'
Remove the leading zero in contigs in variant strings in a table:
>>> t = hl.import_table('data/leading-zero-variants.txt') >>> t.show() +----------------+ | variant | +----------------+ | str | +----------------+ | "01:1000:A:T" | | "01:10001:T:G" | | "02:99:A:C" | | "02:893:G:C" | | "22:100:A:T" | | "X:10:C:A" | +----------------+ >>> t = t.annotate(variant = t.variant.replace("^0([0-9])", "$1")) >>> t.show() +---------------+ | variant | +---------------+ | str | +---------------+ | "1:1000:A:T" | | "1:10001:T:G" | | "2:99:A:C" | | "2:893:G:C" | | "22:100:A:T" | | "X:10:C:A" | +---------------+
Notes
The regex expressions used should follow Java regex syntax. In the Java regular expression syntax, a dollar sign,
$1
, refers to the first group, not the canonical\1
.- Parameters:
pattern1 (str or
StringExpression
)pattern2 (str or
StringExpression
)
- reverse()[source]
Returns the reversed value. .. rubric:: Examples
>>> string = hl.literal('ATGCC') >>> hl.eval(string.reverse()) 'CCGTA'
- Returns:
- show(n=None, width=None, truncate=None, types=True, handler=None, n_rows=None, n_cols=None)
Print the first few records of the expression to the console.
If the expression refers to a value on a keyed axis of a table or matrix table, then the accompanying keys will be shown along with the records.
Examples
>>> table1.SEX.show() +-------+-----+ | ID | SEX | +-------+-----+ | int32 | str | +-------+-----+ | 1 | "M" | | 2 | "M" | | 3 | "F" | | 4 | "F" | +-------+-----+
>>> hl.literal(123).show() +--------+ | <expr> | +--------+ | int32 | +--------+ | 123 | +--------+
Notes
The output can be passed piped to another output source using the handler argument:
>>> ht.foo.show(handler=lambda x: logging.info(x))
- Parameters:
- split(delim, n=None)[source]
Returns an array of strings generated by splitting the string at delim.
Examples
>>> hl.eval(s.split('\s+')) ['The', 'quick', 'brown', 'fox']
>>> hl.eval(s.split('\s+', 2)) ['The', 'quick brown fox']
Notes
The delimiter is a regex using the Java regex syntax delimiter. To split on special characters, escape them with double backslash (
\\
).- Parameters:
delim (str or
StringExpression
) – Delimiter regex.n (
Expression
of typetint32
, optional) – Maximum number of splits.
- Returns:
ArrayExpression
– Array of split strings.
- startswith(substr)[source]
Returns whether substr is a prefix of the string.
Examples
>>> hl.eval(s.startswith('The')) True
>>> hl.eval(s.startswith('the')) False
Note
This method is case-sensitive.
- Parameters:
substr (
StringExpression
)- Returns:
- strip()[source]
Returns a copy of the string with whitespace removed from the start and end.
Examples
>>> s2 = hl.str(' once upon a time\n') >>> hl.eval(s2.strip()) 'once upon a time'
- Returns:
- summarize(handler=None)
Compute and print summary information about the expression.
Danger
This functionality is experimental. It may not be tested as well as other parts of Hail and the interface is subject to change.
- take(n, _localize=True)
Collect the first n records of an expression.
Examples
Take the first three rows:
>>> table1.X.take(3) [5, 6, 7]
Warning
Extremely experimental.
- Parameters:
n (int) – Number of records to take.
- Returns:
- translate(mapping)[source]
Translates characters of the string using mapping.
Examples
>>> string = hl.literal('ATTTGCA') >>> hl.eval(string.translate({'T': 'U'})) 'AUUUGCA'
- Parameters:
mapping (
DictExpression
) – Dictionary of character-character translations.- Returns:
See also