hail.vds.interval_coverage
- hail.vds.interval_coverage(vds, intervals, gq_thresholds=(0, 10, 20), dp_thresholds=(0, 1, 10, 20, 30), dp_field=None)[source]
Compute statistics about base coverage by interval.
Returns a
MatrixTable
with interval row keys and sample column keys.- Contains the following row fields:
interval
(interval): Genomic interval of interest.interval_size
(int32): Size of interval, in bases.
Computes the following entry fields:
bases_over_gq_threshold
(tuple of int64): Number of bases in the interval over each GQ threshold.fraction_over_gq_threshold
(tuple of float64): Fraction of interval (in bases) above each GQ threshold. Computed by dividing each member of bases_over_gq_threshold by interval_size.bases_over_dp_threshold
(tuple of int64): Number of bases in the interval over each DP threshold.fraction_over_dp_threshold
(tuple of float64): Fraction of interval (in bases) above each DP threshold. Computed by dividing each member of bases_over_dp_threshold by interval_size.sum_dp
(int64): Sum of depth values by base across the interval.mean_dp
(float64): Mean depth of bases across the interval. Computed by dividing sum_dp by interval_size.
If the dp_field parameter is not specified, the
DP
is used for depth if present. If noDP
field is present, theMIN_DP
field is used. If noDP
orMIN_DP
field is present, no depth statistics will be calculated.Note
The metrics computed by this method are computed only from reference blocks. Most variant callers produce data where non-reference calls interrupt reference blocks, and so the metrics computed here are slight underestimates of the true values (which would include the quality/depth of non-reference calls). This is likely a negligible difference, but is something to be aware of, especially as it interacts with samples of ancestral backgrounds with more or fewer non-reference calls.
- Parameters:
vds (
VariantDataset
)intervals (
Table
) – Table of intervals. Must be start-inclusive, and cannot span contigs.gq_thresholds (tuple of int) – GQ thresholds.
dp_field (str, optional) – Field for depth calculation. Uses DP or MIN_DP by default (with priority for DP if present).
- Returns:
MatrixTable
– Interval-by-sample matrix