LDMatrix

class hail.LDMatrix(jldm)[source]

Represents a symmetric matrix encoding the Pearson correlation between each pair of variants in the accompanying variant list.

Methods

__init__
export Exports this matrix as a delimited text file.
matrix Gets the distributed matrix backing this LD matrix.
read Reads the LD matrix from a file.
to_local_matrix Converts the LD matrix to a local Spark matrix.
variant_list Gets the list of variants.
write Writes the LD matrix to a file.
export(path, column_delimiter, header=None, parallel_write=False, entries='full')[source]

Exports this matrix as a delimited text file.

Examples

Write a full LD matrix as a tab-separated file:

>>> vds.ld_matrix().export('output/ld_matrix.tsv', column_delimiter='   ')

Write a full LD matrix as a comma-separated file with the variant list as a header:

>>> ldm = vds.ld_matrix()
>>> ldm.export('output/ld_matrix.tsv',
...            column_delimiter=',',
...            header=','.join([str(v) for v in ldm.variant_list()]))

Write a full LD matrix as a folder of comma-separated file shards:

>>> ldm = vds.ld_matrix()
>>> ldm.export('output/ld_matrix.tsv',
...            column_delimiter=',',
...            header=None,
...            parallel_write=True)

Write the upper-triangle with the diagonal as a comma-separated file:

>>> ldm = vds.ld_matrix()
>>> ldm.export('output/ld_matrix.tsv',
...            column_delimiter=',',
...            entries='upper')

Notes

A matrix cannot be exported if it has more than 2^31 - 1 columns.

A full, 3x3 LD matrix written as a comma-separated file looks like this:

1.0,0.8,0.7
0.8,1.0,0.3
0.7,0.3,1.0

The strict lower triangle:

0.8
0.7,0.3

The lower triangle:

1.0
0.8,1.0
0.7,0.3,1.0

The strict upper triangle:

0.8,0.7
0.3

The upper triangle:

1.0,0.8,0.7
1.0,0.3
1.0
Parameters:
  • path (str or None) – the path at which to write the LD matrix
  • column_delimiter (str) – the column delimiter
  • header – a string to append before the first row of the matrix
  • parallel_write (bool) – if false, a single file is produced, otherwise a folder of file shards is produce; if set to false the export will be slower
  • entries (str) – describes what portion of the entries should be printed, see the notes for a detailed description
matrix()[source]

Gets the distributed matrix backing this LD matrix.

Returns:Matrix of Pearson correlation values.
Return type:IndexedRowMatrix
static read()[source]

Reads the LD matrix from a file.

Examples

Read an LD matrix from a file.

>>> ld_matrix = LDMatrix.read('data/ld_matrix')
Parameters:path (str) – the path from which to read the LD matrix
to_local_matrix()[source]

Converts the LD matrix to a local Spark matrix.

Caution

Only call this method when the LD matrix is small enough to fit in local memory on the driver.

Returns:Matrix of Pearson correlation values.
Return type:Matrix
variant_list()[source]

Gets the list of variants. The (i, j) entry of the matrix encodes the Pearson correlation between the ith and jth variants.

Returns:List of variants.
Return type:list of Variant
write(path)[source]

Writes the LD matrix to a file.

Examples

Write an LD matrix to a file.

>>> vds.ld_matrix().write('output/ld_matrix')
Parameters:path (str) – the path to which to write the LD matrix