topk k: int64, [score: expression]?, [groupby: col storable]?

Retains rows that belong to the highest-ranked groups in the current query window and adds an int64 _c_rank column giving each kept row its group’s rank.

k must be a compile-time non-negative int64. Group membership follows the grouping columns in effect for the stage (typically from group_by(...) in the pipeline). With k == 0, no rows are emitted (the rank column is still part of the schema). With an empty grouping set and k >= 1, every row sits in one group, so the group is passed through unchanged and _c_rank is 1.

You may pass a score expression after k. It must evaluate to a scalar storable type; aggregate functions inside the score are evaluated per group, and nested aggregates are rejected. Columns referenced in the score must be either grouping columns or inside aggregates.

If you omit score, groups are ordered by the first non–grouping-key column whose type is strictly numeric or duration and that is not a reserved name. If no such column exists, ordering uses all grouping keys.

For the lowest-ranked groups instead of the highest, use bottomk.

Categories

Accelerable

topk is never accelerable. A dataset that only uses accelerable verbs can be accelerated, making queries on the dataset respond faster.

Examples

topk 5

Keeps the five best-ranked groups using default scoring over a non-key numeric column with primary-key grouping.

topk 3, group_by(station_id)

Ranks groups explicitly by station_id instead of the dataset primary key when choosing the top three groups.

topk 10, max(reading)

Selects the ten groups whose maximum reading in the window is largest, using an aggregate score expression.

topk 1, group_by()

With an empty grouping set every row shares one group, so topk passes the input through and assigns rank 1 to each row.

topk 5, max(reading), group_by(station_id)

Combines an explicit group_by with max(reading) so only the five stations with the highest peak readings remain.