GroupedTable
- class hail.GroupedTable[source]
- Bases: - BaseTable- Table grouped by row that can be aggregated into a new table. - There are only two operations on a grouped table, - GroupedTable.partition_hint()and- GroupedTable.aggregate().- Attributes - Methods - Aggregate by group, used after - Table.group_by().- Set the target number of partitions for aggregation. - aggregate(**named_exprs)[source]
- Aggregate by group, used after - Table.group_by().- Examples - Compute the mean value of X and the sum of Z per unique ID: - >>> table_result = (table1.group_by(table1.ID) ... .aggregate(meanX = hl.agg.mean(table1.X), sumZ = hl.agg.sum(table1.Z))) - Group by a height bin and compute sex ratio per bin: - >>> table_result = (table1.group_by(height_bin = table1.HT // 20) ... .aggregate(fraction_female = hl.agg.fraction(table1.SEX == 'F'))) - Notes - The resulting table has a key field for each group and a value field for each aggregation. The names of the aggregation expressions must be distinct from the names of the groups. - Parameters:
- named_exprs (varargs of - Expression) – Aggregation expressions.
- Returns:
- Table– Aggregated table.
 
 - partition_hint(n)[source]
- Set the target number of partitions for aggregation. - Examples - Use partition_hint in a - Table.group_by()/- GroupedTable.aggregate()pipeline:- >>> table_result = (table1.group_by(table1.ID) ... .partition_hint(5) ... .aggregate(meanX = hl.agg.mean(table1.X), sumZ = hl.agg.sum(table1.Z))) - Notes - Until Hail’s query optimizer is intelligent enough to sample records at all stages of a pipeline, it can be necessary in some places to provide some explicit hints. - The default number of partitions for - GroupedTable.aggregate()is the number of partitions in the upstream table. If the aggregation greatly reduces the size of the table, providing a hint for the target number of partitions can accelerate downstream operations.- Parameters:
- n (int) – Number of partitions. 
- Returns:
- GroupedTable– Same grouped table with a partition hint.