Aggregation
For a full list of aggregators, see the aggregators section of the API reference.
Table Aggregations
Aggregate Over Rows Into A Local Value
One aggregation
- description:
Compute the fraction of rows where
SEX == 'M'
in a table.- code:
>>> ht.aggregate(hl.agg.fraction(ht.SEX == 'M')) 0.5
- dependencies:
Multiple aggregations
- description:
Compute two aggregation statistics, the fraction of rows where
SEX == 'M'
and the mean value ofX
, from the rows of a table.- code:
>>> ht.aggregate(hl.struct(fraction_male = hl.agg.fraction(ht.SEX == 'M'), ... mean_x = hl.agg.mean(ht.X))) Struct(fraction_male=0.5, mean_x=6.5)
- dependencies:
Table.aggregate()
,aggregators.fraction()
,aggregators.mean()
,StructExpression
Aggregate Per Group
- description:
Group the table
ht
byID
and compute the mean value ofX
per group.- code:
>>> result_ht = ht.group_by(ht.ID).aggregate(mean_x=hl.agg.mean(ht.X))
- dependencies:
Table.group_by()
,GroupedTable.aggregate()
,aggregators.mean()
Matrix Table Aggregations
Aggregate Entries Per Row (Over Columns)
- description:
Count the number of occurrences of each unique
GT
field per row, i.e. aggregate over the columns of the matrix table.Methods
MatrixTable.filter_rows()
,MatrixTable.select_rows()
, andMatrixTable.transmute_rows()
also support aggregation over columns.- code:
>>> result_mt = mt.annotate_rows(gt_counter=hl.agg.counter(mt.GT))
- dependencies:
Aggregate Entries Per Column (Over Rows)
- description:
Compute the mean of the
GQ
field per column, i.e. aggregate over the rows of the MatrixTable.Methods
MatrixTable.filter_cols()
,MatrixTable.select_cols()
, andMatrixTable.transmute_cols()
also support aggregation over rows.- code:
>>> result_mt = mt.annotate_cols(gq_mean=hl.agg.mean(mt.GQ))
- dependencies:
Aggregate Column Values Into a Local Value
One aggregation
- description:
Aggregate over the column-indexed field
pheno.is_female
to compute the fraction of female samples in the matrix table.- code:
>>> mt.aggregate_cols(hl.agg.fraction(mt.pheno.is_female)) 0.44
- dependencies:
Multiple aggregations
- description:
Perform multiple aggregations over column-indexed fields by using a struct expression. The result is a single struct containing two nested fields,
fraction_female
andcase_ratio
.- code:
>>> mt.aggregate_cols(hl.struct( ... fraction_female=hl.agg.fraction(mt.pheno.is_female), ... case_ratio=hl.agg.count_where(mt.is_case) / hl.agg.count())) Struct(fraction_female=0.44, case_ratio=1.0)
- dependencies:
MatrixTable.aggregate_cols()
,aggregators.fraction()
,aggregators.count_where()
,StructExpression
Aggregate Row Values Into a Local Value
One aggregation
- description:
Compute the mean value of the row-indexed field
qual
.- code:
>>> mt.aggregate_rows(hl.agg.mean(mt.qual)) 140054.73333333334
- dependencies:
Multiple aggregations
- description:
Perform two row aggregations: count the number of row values of
qual
that are greater than 40, and compute the mean value ofqual
. The result is a single struct containing two nested fields,n_high_quality
andmean_qual
.- code:
>>> mt.aggregate_rows( ... hl.struct(n_high_quality=hl.agg.count_where(mt.qual > 40), ... mean_qual=hl.agg.mean(mt.qual))) Struct(n_high_quality=9, mean_qual=140054.73333333334)
- dependencies:
MatrixTable.aggregate_rows()
,aggregators.count_where()
,aggregators.mean()
,StructExpression
Aggregate Entry Values Into A Local Value
- description:
Compute the mean of the entry-indexed field
GQ
and the call rate of the entry-indexed fieldGT
. The result is returned as a single struct with two nested fields.- code:
>>> mt.aggregate_entries( ... hl.struct(global_gq_mean=hl.agg.mean(mt.GQ), ... call_rate=hl.agg.fraction(hl.is_defined(mt.GT)))) Struct(global_gq_mean=69.60514541387025, call_rate=0.9933333333333333)
- dependencies:
MatrixTable.aggregate_entries()
,aggregators.mean()
,aggregators.fraction()
,StructExpression
Aggregate Per Column Group
- description:
Group the columns of the matrix table by the column-indexed field
cohort
and compute the call rate per cohort.- code:
>>> result_mt = (mt.group_cols_by(mt.cohort) ... .aggregate(call_rate=hl.agg.fraction(hl.is_defined(mt.GT))))
- dependencies:
MatrixTable.group_cols_by()
,GroupedMatrixTable
,GroupedMatrixTable.aggregate()
- understanding:
Group the columns of the matrix table by the column-indexed field
cohort
usingMatrixTable.group_cols_by()
, which returns aGroupedMatrixTable
. Then useGroupedMatrixTable.aggregate()
to compute an aggregation per column group.The result is a matrix table with an entry field
call_rate
that contains the result of the aggregation. The new matrix table has a row schema equal to the original row schema, a column schema equal to the fields passed togroup_cols_by
, and an entry schema determined by the expression passed toaggregate
. Other column fields and entry fields are dropped.
Aggregate Per Row Group
- description:
Compute the number of calls with one or more non-reference alleles per gene group.
- code:
>>> result_mt = (mt.group_rows_by(mt.gene) ... .aggregate(n_non_ref=hl.agg.count_where(mt.GT.is_non_ref())))
- dependencies:
MatrixTable.group_rows_by()
,GroupedMatrixTable
,GroupedMatrixTable.aggregate()
- understanding:
Group the rows of the matrix table by the row-indexed field
gene
usingMatrixTable.group_rows_by()
, which returns aGroupedMatrixTable
. Then useGroupedMatrixTable.aggregate()
to compute an aggregation per grouped row.The result is a matrix table with an entry field
n_non_ref
that contains the result of the aggregation. This new matrix table has a row schema equal to the fields passed togroup_rows_by
, a column schema equal to the column schema of the original matrix table, and an entry schema determined by the expression passed toaggregate
. Other row fields and entry fields are dropped.