Release Notes¶
No new features for Observe instances.
OPAL Language Updates¶
array_union_agg
¶
Description¶
Returns an array that contains the multiset union of input arrays.
array_union_agg
is an aggregate function that combines input arrays into a single output array. The output array is the multiset union of the input arrays. The multiset union differs from the standard set union in that it allows duplicates: If any of the input arrays contain duplicates, the output array will also contain duplicates. The number of times an element appears in the output array equals the maximum number of times it appears in individual input arrays.
This function ignores NULL values for input arrays inside the column as well as NULL values inside input arrays. If the column contains only NULL values or there are no rows, the function returns an empty array. The ordering of output arrays is nondeterministic.
Return type¶
array
Domain¶
This is an aggregate function (aggregates rows over a group in aggregate verbs.)
This is a window function (calculates over a group of multiple input rows using windowing.)
Categories¶
Aggregate
Semistructured
Window
Usage¶
array_union_agg( arr )
Argument |
Type |
Required |
Multiple |
Constant |
---|---|---|---|---|
arr |
array |
Required |
Only one |
Variable |
Examples¶
statsby aua:array_union_agg(arr), group_by(X)
Assume that you have the following schema and log rows:
year |
X |
arr |
---|---|---|
1991 |
a |
[ 0, 0, 1, 2] |
1992 |
b |
[ 0, 0, 1, 2, 3] |
1993 |
c |
NULL |
1994 |
a |
[ 0, 4] |
1995 |
b |
[ 0, 0, 0, NULL] |
Calculate the multiset union on arr
grouping on X
:
X |
aua |
---|---|
a |
[ 0, 0, 1, 2, 4] |
b |
[ 0, 0, 0, 1, 2, 3] |
c |
[] |
In this example, the output array associated with X
value a
contains values 0, 1, 2, 4
. This is because these values appear in the input arrays in the a
group (years 1991
and 1994
). Value 0
appears twice because it appears twice in at least one input array from the a
group (year 1994
). Similarly, 0
appears three times in the output array for the b
group as it appears a maximal 3 times in the input arrays for b
group (year 1995). As the only input array for c
group is a NULL
array, the output array for c
group is an empty array. Furthermore, the NULL
value inside the row from year 1995 is ignored and does not appear in the output array for b
group.
Note that ordering within the output arrays is not guaranteed through multiset semantics. However, it is possible to achieve sorted arrays in the output by composing with array_sort
:
statsby aua:array_sort(array_union_agg(arr)), group_by(X)