VariantDataset

class hail.vds.VariantDataset[source]

Class for representing cohort-level genomic data.

This class facilitates a sparse, split representation of genomic data in which reference block data and variant data are contained in separate MatrixTable objects.

Parameters:
  • reference_data (MatrixTable) – MatrixTable containing only reference block data.

  • variant_data (MatrixTable) – MatrixTable containing only variant data.

Attributes

ref_block_max_length_field

Name of global field that indicates max reference block length.

reference_genome

Dataset reference genome.

Methods

checkpoint

Write to path and then read from path.

from_merged_representation

Create a VariantDataset from a sparse MatrixTable containing variant and reference data.

n_samples

The number of samples present.

union_rows

Combine many VDSes with the same samples but disjoint variants.

validate

Eagerly checks necessary representational properties of the VDS.

write

Write to path.

checkpoint(path, **kwargs)[source]

Write to path and then read from path.

static from_merged_representation(mt, *, ref_block_fields=(), infer_ref_block_fields=True, is_split=False)[source]

Create a VariantDataset from a sparse MatrixTable containing variant and reference data.

n_samples()[source]

The number of samples present.

ref_block_max_length_field = 'ref_block_max_length'

Name of global field that indicates max reference block length.

property reference_genome

Dataset reference genome.

Returns:

ReferenceGenome

union_rows()[source]

Combine many VDSes with the same samples but disjoint variants.

Examples

If a dataset is imported as VDS in chromosome-chunks, the following will combine them into one VDS:

>>> vds_paths = ['chr1.vds', 'chr2.vds']  
... vds_per_chrom = [hl.vds.read_vds(path) for path in vds_paths)  
... hl.vds.VariantDataset.union_rows(*vds_per_chrom)  
validate(*, check_data=True)[source]

Eagerly checks necessary representational properties of the VDS.

write(path, **kwargs)[source]

Write to path.