hail.vds.interval_coverage¶

hail.vds.
interval_coverage
(vds, intervals, gq_thresholds=(0, 10, 20), dp_thresholds=(0, 1, 10, 20, 30), dp_field=None)[source]¶ Compute statistics about base coverage by interval.
Returns a
MatrixTable
with interval row keys and sample column keys. Contains the following row fields:
interval
(interval): Genomic interval of interest.interval_size
(int32): Size of interval, in bases.
Computes the following entry fields:
bases_over_gq_threshold
(tuple of int64): Number of bases in the interval over each GQ threshold.fraction_over_gq_threshold
(tuple of float64): Fraction of interval (in bases) above each GQ threshold. Computed by dividing each member of bases_over_gq_threshold by interval_size.bases_over_dp_threshold
(tuple of int64): Number of bases in the interval over each DP threshold.fraction_over_dp_threshold
(tuple of float64): Fraction of interval (in bases) above each DP threshold. Computed by dividing each member of bases_over_dp_threshold by interval_size.sum_dp
(int64): Sum of depth values by base across the interval.mean_dp
(float64): Mean depth of bases across the interval. Computed by dividing sum_dp by interval_size.
If the dp_field parameter is not specified, the
DP
is used for depth if present. If noDP
field is present, theMIN_DP
field is used. If noDP
orMIN_DP
field is present, no depth statistics will be calculated.Note
The metrics computed by this method are computed only from reference blocks. Most variant callers produce data where nonreference calls interrupt reference blocks, and so the metrics computed here are slight underestimates of the true values (which would include the quality/depth of nonreference calls). This is likely a negligible difference, but is something to be aware of, especially as it interacts with samples of ancestral backgrounds with more or fewer nonreference calls.
 Parameters
vds (
VariantDataset
)intervals (
Table
) – Table of intervals. Must be startinclusive, and cannot span contigs.gq_thresholds (tuple of int) – GQ thresholds.
dp_field (str, optional) – Field for depth calculation. Uses DP or MIN_DP by default (with priority for DP if present).
 Returns
MatrixTable
– Intervalbysample matrix