Basic Methods for Working with Hail Data

Get Data Into and Out of Hail

Import

Import data from a non-Hail format into a Hail format, using one of the import_* methods.

description

Import a .tsv file as a table.

code
>>> table = hl.import_table('data/kt_example1.tsv', impute=True, key='ID')
>>> table.show()
+-------+-------+-----+-------+-------+-------+-------+-------+
|    ID |    HT | SEX |     X |     Z |    C1 |    C2 |    C3 |
+-------+-------+-----+-------+-------+-------+-------+-------+
| int32 | int32 | str | int32 | int32 | int32 | int32 | int32 |
+-------+-------+-----+-------+-------+-------+-------+-------+
|     1 |    65 | "M" |     5 |     4 |     2 |    50 |     5 |
|     2 |    72 | "M" |     6 |     3 |     2 |    61 |     1 |
|     3 |    70 | "F" |     7 |     3 |    10 |    81 |    -5 |
|     4 |    60 | "F" |     8 |     2 |    11 |    90 |   -10 |
+-------+-------+-----+-------+-------+-------+-------+-------+
dependencies

import_table()

Export

Export Hail data to a non-Hail format, using one of the export_* methods.

description

Export a matrix table as a VCF.

code
>>> hl.export_vcf(mt, 'output/example.vcf.bgz') # doctest: +SKIP
dependencies

export_vcf()

Write

Write data in a Hail format to disk using one of the write() methods, e.g. Table.write() or MatrixTable.write().

description

Write a matrix table to disk.

code
>>> mt.write('output/example.mt') # doctest: +SKIP
dependencies

MatrixTable.write()

Read

If you wrote a table or matrix table to disk using one of Hail’s write() methods, you can read it using one of the read methods.

description

Read a table from disk.

code
>>> ht = hl.read_table('data/example.ht') # doctest: +SKIP
dependencies

read_table()

Examine your data

Explore the schema

Matrix Table

description

Get information about the fields and keys of a matrix table.

code
>>> mt.describe()  
----------------------------------------
Global fields:
    'populations': array<str>
----------------------------------------
Column fields:
    's': str
    'is_case': bool
    'pheno': struct {
        is_case: bool,
        is_female: bool,
        age: float64,
        height: float64,
        blood_pressure: float64,
        cohort_name: str
    }
----------------------------------------
Row fields:
    'locus': locus<GRCh37>
    'alleles': array<str>
    'rsid': str
    'qual': float64
----------------------------------------
Entry fields:
    'GT': call
    'AD': array<int32>
    'DP': int32
    'GQ': int32
    'PL': array<int32>
----------------------------------------
Column key: ['s']
Row key: ['locus', 'alleles']
Partition key: ['locus']
----------------------------------------
dependencies

MatrixTable.describe()

Table

description

Get information about the fields and keys of a table.

code
>>> ht.describe()  
----------------------------------------
Global fields:
    None
----------------------------------------
Row fields:
    'locus': locus<GRCh37>
    'alleles': array<str>
----------------------------------------
Key: ['locus', 'alleles']
----------------------------------------
dependencies

Table.describe()

Expression

description

Get information about a specific field in a table or matrix table.

code
>>> mt.s.describe()  
--------------------------------------------------------
Type:
    str
--------------------------------------------------------
Source:
    <hail.matrixtable.MatrixTable object at 0x60e42f518>
Index:
    ['column']
--------------------------------------------------------
dependencies

Expression.describe()

understanding

We can select fields from a table or matrix table with an expression like mt.s. Then we can call the Expression.describe() method on the expression to get information about the expression’s type, indices, and source.

View your data locally

Table

description

View the first n rows of a table.

code
>>> ht.show(5)
+-------+-------+-----+-------+-------+-------+-------+-------+
|    ID |    HT | SEX |     X |     Z |    C1 |    C2 |    C3 |
+-------+-------+-----+-------+-------+-------+-------+-------+
| int32 | int32 | str | int32 | int32 | int32 | int32 | int32 |
+-------+-------+-----+-------+-------+-------+-------+-------+
|     1 |    65 | "M" |     5 |     4 |     2 |    50 |     5 |
|     2 |    72 | "M" |     6 |     3 |     2 |    61 |     1 |
|     3 |    70 | "F" |     7 |     3 |    10 |    81 |    -5 |
|     4 |    60 | "F" |     8 |     2 |    11 |    90 |   -10 |
+-------+-------+-----+-------+-------+-------+-------+-------+
dependencies

Table.show()

Matrix Table

description

View the columns, rows, or entries of a matrix table.

code
>>> mt.rows().show()
>>> mt.cols().show()
>>> mt.entries().show()
understanding

Unlike tables, matrix tables do not have a show method, but you can call Table.show() on the MatrixTable.rows() table, MatrixTable.cols() table, or MatrixTable.entries() table of your matrix table.

dependencies

Table.show(), MatrixTable.rows(), MatrixTable.cols(), MatrixTable.entries()

Expression

description

View an expression.

code
>>> mt.rsid.show()
+---------------+------------+---------------+
| locus         | alleles    | rsid          |
+---------------+------------+---------------+
| locus<GRCh37> | array<str> | str           |
+---------------+------------+---------------+
| 20:12990057   | ["T","A"]  | "rs3761894"   |
| 20:13029862   | ["C","T"]  | "rs919604"    |
| 20:13074235   | ["G","A"]  | "rs708937"    |
| 20:13140720   | ["G","A"]  | "rs61738161"  |
| 20:13695498   | ["G","A"]  | "rs6079146"   |
| 20:13714384   | ["A","C"]  | "rs41275402"  |
| 20:13765944   | ["C","G"]  | NA            |
| 20:13765954   | ["C","T"]  | "rs113805278" |
| 20:13845987   | ["C","T"]  | "rs761811"    |
| 20:16223957   | ["T","C"]  | "rs1000121"   |
+---------------+------------+---------------+
showing top 10 rows
dependencies

Expression.show()

understanding

mt.rsid is an expression that references a field of mt. We can call Expression.show() to display the first n values referenced by the expression. Since mt.rsid is indexed by row, the row key fields locus and alleles will also be displayed.