Aggregation
For a full list of aggregators, see the aggregators section of the API reference.
Table Aggregations
Aggregate Over Rows Into A Local Value
One aggregation
- description:
Compute the fraction of rows where
SEX == 'M'in a table.- code:
>>> ht.aggregate(hl.agg.fraction(ht.SEX == 'M')) 0.5
- dependencies:
Multiple aggregations
- description:
Compute two aggregation statistics, the fraction of rows where
SEX == 'M'and the mean value ofX, from the rows of a table.- code:
>>> ht.aggregate(hl.struct(fraction_male = hl.agg.fraction(ht.SEX == 'M'), ... mean_x = hl.agg.mean(ht.X))) Struct(fraction_male=0.5, mean_x=6.5)
- dependencies:
Table.aggregate(),aggregators.fraction(),aggregators.mean(),StructExpression
Aggregate Per Group
- description:
Group the table
htbyIDand compute the mean value ofXper group.- code:
>>> result_ht = ht.group_by(ht.ID).aggregate(mean_x=hl.agg.mean(ht.X))
- dependencies:
Table.group_by(),GroupedTable.aggregate(),aggregators.mean()
Matrix Table Aggregations
Aggregate Entries Per Row (Over Columns)
- description:
Count the number of occurrences of each unique
GTfield per row, i.e. aggregate over the columns of the matrix table.Methods
MatrixTable.filter_rows(),MatrixTable.select_rows(), andMatrixTable.transmute_rows()also support aggregation over columns.- code:
>>> result_mt = mt.annotate_rows(gt_counter=hl.agg.counter(mt.GT))
- dependencies:
Aggregate Entries Per Column (Over Rows)
- description:
Compute the mean of the
GQfield per column, i.e. aggregate over the rows of the MatrixTable.Methods
MatrixTable.filter_cols(),MatrixTable.select_cols(), andMatrixTable.transmute_cols()also support aggregation over rows.- code:
>>> result_mt = mt.annotate_cols(gq_mean=hl.agg.mean(mt.GQ))
- dependencies:
Aggregate Column Values Into a Local Value
One aggregation
- description:
Aggregate over the column-indexed field
pheno.is_femaleto compute the fraction of female samples in the matrix table.- code:
>>> mt.aggregate_cols(hl.agg.fraction(mt.pheno.is_female)) 0.44
- dependencies:
Multiple aggregations
- description:
Perform multiple aggregations over column-indexed fields by using a struct expression. The result is a single struct containing two nested fields,
fraction_femaleandcase_ratio.- code:
>>> mt.aggregate_cols(hl.struct( ... fraction_female=hl.agg.fraction(mt.pheno.is_female), ... case_ratio=hl.agg.count_where(mt.is_case) / hl.agg.count())) Struct(fraction_female=0.44, case_ratio=1.0)
- dependencies:
MatrixTable.aggregate_cols(),aggregators.fraction(),aggregators.count_where(),StructExpression
Aggregate Row Values Into a Local Value
One aggregation
- description:
Compute the mean value of the row-indexed field
qual.- code:
>>> mt.aggregate_rows(hl.agg.mean(mt.qual)) 140054.73333333334
- dependencies:
Multiple aggregations
- description:
Perform two row aggregations: count the number of row values of
qualthat are greater than 40, and compute the mean value ofqual. The result is a single struct containing two nested fields,n_high_qualityandmean_qual.- code:
>>> mt.aggregate_rows( ... hl.struct(n_high_quality=hl.agg.count_where(mt.qual > 40), ... mean_qual=hl.agg.mean(mt.qual))) Struct(n_high_quality=9, mean_qual=140054.73333333334)
- dependencies:
MatrixTable.aggregate_rows(),aggregators.count_where(),aggregators.mean(),StructExpression
Aggregate Entry Values Into A Local Value
- description:
Compute the mean of the entry-indexed field
GQand the call rate of the entry-indexed fieldGT. The result is returned as a single struct with two nested fields.- code:
>>> mt.aggregate_entries( ... hl.struct(global_gq_mean=hl.agg.mean(mt.GQ), ... call_rate=hl.agg.fraction(hl.is_defined(mt.GT)))) Struct(global_gq_mean=69.60514541387025, call_rate=0.9933333333333333)
- dependencies:
MatrixTable.aggregate_entries(),aggregators.mean(),aggregators.fraction(),StructExpression
Aggregate Per Column Group
- description:
Group the columns of the matrix table by the column-indexed field
cohortand compute the call rate per cohort.- code:
>>> result_mt = (mt.group_cols_by(mt.cohort) ... .aggregate(call_rate=hl.agg.fraction(hl.is_defined(mt.GT))))
- dependencies:
MatrixTable.group_cols_by(),GroupedMatrixTable,GroupedMatrixTable.aggregate()- understanding:
Group the columns of the matrix table by the column-indexed field
cohortusingMatrixTable.group_cols_by(), which returns aGroupedMatrixTable. Then useGroupedMatrixTable.aggregate()to compute an aggregation per column group.The result is a matrix table with an entry field
call_ratethat contains the result of the aggregation. The new matrix table has a row schema equal to the original row schema, a column schema equal to the fields passed togroup_cols_by, and an entry schema determined by the expression passed toaggregate. Other column fields and entry fields are dropped.
Aggregate Per Row Group
- description:
Compute the number of calls with one or more non-reference alleles per gene group.
- code:
>>> result_mt = (mt.group_rows_by(mt.gene) ... .aggregate(n_non_ref=hl.agg.count_where(mt.GT.is_non_ref())))
- dependencies:
MatrixTable.group_rows_by(),GroupedMatrixTable,GroupedMatrixTable.aggregate()- understanding:
Group the rows of the matrix table by the row-indexed field
geneusingMatrixTable.group_rows_by(), which returns aGroupedMatrixTable. Then useGroupedMatrixTable.aggregate()to compute an aggregation per grouped row.The result is a matrix table with an entry field
n_non_refthat contains the result of the aggregation. This new matrix table has a row schema equal to the fields passed togroup_rows_by, a column schema equal to the column schema of the original matrix table, and an entry schema determined by the expression passed toaggregate. Other row fields and entry fields are dropped.