array_union_agg¶
Description¶
Returns an array containing the multiset union of input arrays.
array_union_agg
is an aggregate function which combines input arrays into a single output array. The output array is the multiset union of the input arrays. The multiset union differs from the standard set union in that it allows duplicates: If any of the input arrays contain duplicates, the output array will also contain duplicates. The number of times an element appears in the output array is equal to the maximum number of times it appears in individual input arrays.
This function ignores NULL values for input arrays inside the column as well as NULL values inside input arrays. If the column contains only NULL values or there are no rows, the function returns an empty array. The ordering of output arrays is nondeterministic.
Return type¶
arr
Domain¶
This is an aggregate function (aggregates rows over a group in aggregate verbs.)
This is a window function (calculates over a group of multiple input rows using windowing.)
Categories¶
Usage¶
array_union_agg(arr)
Argument |
Type |
Optional |
Repeatable |
Restrictions |
---|---|---|---|---|
arr |
array |
no |
no |
none |
Examples¶
statsby aua:array_union_agg(arr), group_by(X)
Assume we have the following schema and log rows:
year |
X |
arr |
---|---|---|
1991 |
a |
[ 0, 0, 1, 2] |
1992 |
b |
[ 0, 0, 1, 2, 3] |
1993 |
c |
NULL |
1994 |
a |
[ 0, 4] |
1995 |
b |
[ 0, 0, 0, NULL] |
Calculate the multiset union on arr
grouping on X
:
X |
aua |
---|---|
a |
[ 0, 0, 1, 2, 4] |
b |
[ 0, 0, 0, 1, 2, 3] |
c |
[] |
In this example the output array associated with X
value a
contains values
0, 1, 2, 4
. This is because these values appear in the input arrays in the
a
group (years 1991
and 1994
). Value 0
appears twice because it appears
twice in at least one input array from the a
group (year 1994
). Similarly,
0
appears three times in the output array for the b
group as it appears a
maximal 3 times in the input arrays for b
group (year 1995). As the only
input array for c
group is a NULL
array, the output array for c
group is
an empty array. Furthermore, the NULL
value inside the row from year 1995 is
ignored and does not appear in the output array for b
group.
Note that ordering within the output arrays is not guaranteed through multiset
semantics. However, it is possible to achieve sorted arrays in the output by
composing with array_sort
:
statsby aua:array_sort(array_union_agg(arr)), group_by(X)