hail.vds.interval_coverage

hail.vds.interval_coverage(vds, intervals, gq_thresholds=(0, 10, 20), dp_thresholds=(0, 1, 10, 20, 30), dp_field=None)[source]

Compute statistics about base coverage by interval.

Returns a MatrixTable with interval row keys and sample column keys.

Contains the following row fields:

interval (interval): Genomic interval of interest.
interval_size (int32): Size of interval, in bases.

Computes the following entry fields:

bases_over_gq_threshold (tuple of int64): Number of bases in the interval over each GQ threshold.

fraction_over_gq_threshold (tuple of float64): Fraction of interval (in bases) above each GQ threshold. Computed by dividing each member of bases_over_gq_threshold by interval_size.

bases_over_dp_threshold (tuple of int64): Number of bases in the interval over each DP threshold.

fraction_over_dp_threshold (tuple of float64): Fraction of interval (in bases) above each DP threshold. Computed by dividing each member of bases_over_dp_threshold by interval_size.

sum_dp (int64): Sum of depth values by base across the interval.

mean_dp (float64): Mean depth of bases across the interval. Computed by dividing sum_dp by interval_size.

If the dp_field parameter is not specified, the DP is used for depth if present. If no DP field is present, the MIN_DP field is used. If no DP or MIN_DP field is present, no depth statistics will be calculated.

Note

The metrics computed by this method are computed only from reference blocks. Most variant callers produce data where non-reference calls interrupt reference blocks, and so the metrics computed here are slight underestimates of the true values (which would include the quality/depth of non-reference calls). This is likely a negligible difference, but is something to be aware of, especially as it interacts with samples of ancestral backgrounds with more or fewer non-reference calls.

Parameters:

vds (VariantDataset)
intervals (Table) – Table of intervals. Must be start-inclusive, and cannot span contigs.
gq_thresholds (tuple of int) – GQ thresholds.
dp_field (str, optional) – Field for depth calculation. Uses DP or MIN_DP by default (with priority for DP if present).

Returns:

MatrixTable – Interval-by-sample matrix