dedup
Aliases: distinct.
dedup [columnname: expression]*
Collapses duplicate rows while preserving the input column layout and dataset kind.
No arguments
With no arguments, any two rows that match in every column are merged into one row.
Grouping columns
With arguments, each must be a plain column reference on the default dataset (not an expression, not a path into a structured column). Rows that agree on all listed columns are merged into one row.
Event and interval time columns
On event or interval inputs, if you omit the active valid_from or valid_to column from the argument list, those columns are still included in the grouping key so rows at different times stay distinct.
Values in other columns
The active time columns are never merged from competing values; they only act as grouping keys when required. Every other non-time column outside the grouping key is reduced to a single value per group using a merge that prefers non-null values but does not guarantee which surviving value is kept when several non-null values disagree.
Resources
On resource inputs, only argumentless dedup is allowed; providing grouping columns is rejected at compile time.
The alias distinct is the same verb.
Categories
Accelerable
dedup is always accelerable if the input is accelerable. A dataset that only uses accelerable verbs can be accelerated, making queries on the dataset respond faster.
Examples
dedup month_number
Demonstrates keyed dedup with a single grouping column: rows that agree on that column collapse to one row, with values in other columns merged using any-style semantics.
dedup
Demonstrates argumentless dedup, which merges only rows that agree on every column; any column difference keeps both rows.
dedup month_number, month_name
Demonstrates keyed dedup with several grouping columns so the collapse key is the tuple of those columns, not each column in isolation.
Updated about 17 hours ago