VariantDataset
- class hail.vds.VariantDataset[source]
Class for representing cohort-level genomic data.
This class facilitates a sparse, split representation of genomic data in which reference block data and variant data are contained in separate
MatrixTable
objects.- Parameters:
reference_data (
MatrixTable
) – MatrixTable containing only reference block data.variant_data (
MatrixTable
) – MatrixTable containing only variant data.
Attributes
Name of global field that indicates max reference block length.
Dataset reference genome.
Methods
Write to path and then read from path.
Create a VariantDataset from a sparse MatrixTable containing variant and reference data.
The number of samples present.
Combine many VDSes with the same samples but disjoint variants.
Eagerly checks necessary representational properties of the VDS.
Write to path.
- static from_merged_representation(mt, *, ref_block_fields=(), infer_ref_block_fields=True, is_split=False)[source]
Create a VariantDataset from a sparse MatrixTable containing variant and reference data.
- ref_block_max_length_field = 'ref_block_max_length'
Name of global field that indicates max reference block length.
- property reference_genome
Dataset reference genome.
- Returns:
- union_rows()[source]
Combine many VDSes with the same samples but disjoint variants.
Examples
If a dataset is imported as VDS in chromosome-chunks, the following will combine them into one VDS:
>>> vds_paths = ['chr1.vds', 'chr2.vds'] ... vds_per_chrom = [hl.vds.read_vds(path) for path in vds_paths) ... hl.vds.VariantDataset.union_rows(*vds_per_chrom)