hail.vds.truncate_reference_blocks
- hail.vds.truncate_reference_blocks(ds, *, max_ref_block_base_pairs=None, ref_block_winsorize_fraction=None)[source]
Cap reference blocks at a maximum length in order to permit faster interval filtering.
Examples
Truncate reference blocks to 5 kilobases:
>>> vds2 = hl.vds.truncate_reference_blocks(vds, max_ref_block_base_pairs=5000)
Truncate the longest 1% of reference blocks to the length of the 99th percentile block:
>>> vds2 = hl.vds.truncate_reference_blocks(vds, ref_block_winsorize_fraction=0.01)
Notes
After this function has been run, the reference blocks have a known maximum length ref_block_max_length, stored in the global fields, which permits
vds.filter_intervals()
to filter to intervals of the reference data by reading ref_block_max_length bases ahead of each interval. This allows narrow interval queries to run in roughly O(data kept) work rather than O(all reference data) work.It is also possible to patch an existing VDS to store the max reference block length with
vds.store_ref_block_max_length()
.See also
- Parameters:
vds (
VariantDataset
orMatrixTable
)max_ref_block_base_pairs – Maximum size of reference blocks, in base pairs.
ref_block_winsorize_fraction – Fraction of reference block length distribution to truncate / winsorize.
- Returns: