- hail.vds.interval_coverage(vds, intervals, gq_thresholds=(0, 10, 20), dp_thresholds=(0, 1, 10, 20, 30), dp_field=None)
Compute statistics about base coverage by interval.
MatrixTablewith interval row keys and sample column keys.
- Contains the following row fields:
interval(interval): Genomic interval of interest.
interval_size(int32): Size of interval, in bases.
Computes the following entry fields:
bases_over_gq_threshold(tuple of int64): Number of bases in the interval over each GQ threshold.
fraction_over_gq_threshold(tuple of float64): Fraction of interval (in bases) above each GQ threshold. Computed by dividing each member of bases_over_gq_threshold by interval_size.
bases_over_dp_threshold(tuple of int64): Number of bases in the interval over each DP threshold.
fraction_over_dp_threshold(tuple of float64): Fraction of interval (in bases) above each DP threshold. Computed by dividing each member of bases_over_dp_threshold by interval_size.
sum_dp(int64): Sum of depth values by base across the interval.
mean_dp(float64): Mean depth of bases across the interval. Computed by dividing sum_dp by interval_size.
If the dp_field parameter is not specified, the
DPis used for depth if present. If no
DPfield is present, the
MIN_DPfield is used. If no
MIN_DPfield is present, no depth statistics will be calculated.
The metrics computed by this method are computed only from reference blocks. Most variant callers produce data where non-reference calls interrupt reference blocks, and so the metrics computed here are slight underestimates of the true values (which would include the quality/depth of non-reference calls). This is likely a negligible difference, but is something to be aware of, especially as it interacts with samples of ancestral backgrounds with more or fewer non-reference calls.
MatrixTable– Interval-by-sample matrix