Release Notes August 25, 2023¶
OPAL Language Updates¶
topk_aggr
¶
Description¶
Returns an approximation of the top K most frequent values in the input, along with their approximate frequencies.
The output contains an array of arrays. In the inner arrays, the first entry is the value in the input, while the second entry is its frequency. The outer array contains k
elements, sorted by frequencies in descending order.
Return type¶
array
Domain¶
This is an aggregate function (aggregates rows over a group in aggregate verbs.
This is a window function (calculates over a group of multiple input rows using windowing.)
Categories¶
Usage¶
topk_agg( expr, k )
Argument |
Type |
Required |
Multiple |
---|---|---|---|
expr |
any |
Required |
Only one |
k |
int64 |
Required |
Only one |
Examples¶
statsby top_names:topk_agg(name, 2), group_by(class)
Given the following input:
name |
class |
---|---|
Jack |
A |
Joe |
A |
Alice |
A |
Alice |
A |
Tom |
B |
Joe |
B |
Kathy |
B |
Mike |
A |
Tom |
B |
It returns the following output:
class |
top_names |
---|---|
A |
[[“Alice”, 2], [“Jack”, 1]] |
B |
[[“Tom”, 2], [“Kathy”, 1]] |
Note that if there is a tie in the last position the result can be non-deterministic. Any of the values with the same frequency may be included in the last position.