LDMatrix¶
-
class
hail.
LDMatrix
(jldm)[source]¶ Represents a symmetric matrix encoding the Pearson correlation between each pair of variants in the accompanying variant list.
Methods
__init__
export
Exports this matrix as a delimited text file. matrix
Gets the distributed matrix backing this LD matrix. read
Reads the LD matrix from a file. to_local_matrix
Converts the LD matrix to a local Spark matrix. variant_list
Gets the list of variants. write
Writes the LD matrix to a file. -
export
(path, column_delimiter, header=None, parallel_write=False, entries='full')[source]¶ Exports this matrix as a delimited text file.
Examples
Write a full LD matrix as a tab-separated file:
>>> vds.ld_matrix().export('output/ld_matrix.tsv', column_delimiter=' ')
Write a full LD matrix as a comma-separated file with the variant list as a header:
>>> ldm = vds.ld_matrix() >>> ldm.export('output/ld_matrix.tsv', ... column_delimiter=',', ... header=','.join([str(v) for v in ldm.variant_list()]))
Write a full LD matrix as a folder of comma-separated file shards:
>>> ldm = vds.ld_matrix() >>> ldm.export('output/ld_matrix.tsv', ... column_delimiter=',', ... header=None, ... parallel_write=True)
Write the upper-triangle with the diagonal as a comma-separated file:
>>> ldm = vds.ld_matrix() >>> ldm.export('output/ld_matrix.tsv', ... column_delimiter=',', ... entries='upper')
Notes
A matrix cannot be exported if it has more than
2^31 - 1
columns.A full, 3x3 LD matrix written as a comma-separated file looks like this:
1.0,0.8,0.7 0.8,1.0,0.3 0.7,0.3,1.0
The strict lower triangle:
0.8 0.7,0.3
The lower triangle:
1.0 0.8,1.0 0.7,0.3,1.0
The strict upper triangle:
0.8,0.7 0.3
The upper triangle:
1.0,0.8,0.7 1.0,0.3 1.0
Parameters: - path (str or None) – the path at which to write the LD matrix
- column_delimiter (str) – the column delimiter
- header – a string to append before the first row of the matrix
- parallel_write (bool) – if false, a single file is produced, otherwise a folder of file shards is produce; if set to false the export will be slower
- entries (str) – describes what portion of the entries should be printed, see the notes for a detailed description
-
matrix
()[source]¶ Gets the distributed matrix backing this LD matrix.
Returns: Matrix of Pearson correlation values. Return type: IndexedRowMatrix
-
static
read
()[source]¶ Reads the LD matrix from a file.
Examples
Read an LD matrix from a file.
>>> ld_matrix = LDMatrix.read('data/ld_matrix')
Parameters: path (str) – the path from which to read the LD matrix
-
to_local_matrix
()[source]¶ Converts the LD matrix to a local Spark matrix.
Caution
Only call this method when the LD matrix is small enough to fit in local memory on the driver.
Returns: Matrix of Pearson correlation values. Return type: Matrix
-