array_union_agg#

Description#

Returns an array containing the multiset union of input arrays.

array_union_agg is an aggregate function which combines input arrays into a single output array. The output array is the multiset union of the input arrays. The multiset union differs from the standard set union in that it allows duplicates: If any of the input arrays contain duplicates, the output array will also contain duplicates. The number of times an element appears in the output array is equal to the maximum number of times it appears in individual input arrays.

This function ignores NULL values for input arrays inside the column as well as NULL values inside input arrays. If the column contains only NULL values or there are no rows, the function returns an empty array. The ordering of output arrays is nondeterministic.

Return type#

array

Domain#

This is an aggregate function (aggregates rows over a group in aggregate verbs.)

This is a window function (calculates over a group of multiple input rows using windowing.)

Categories#

Usage#

array_union_agg(arr)

Argument

Type

Optional

Repeatable

Restrictions

arr

array

no

no

none

Examples#

statsby aua:array_union_agg(arr), group_by(X)

Assume we have the following schema and log rows:

year

X

arr

1991

a

[ 0, 0, 1, 2]

1992

b

[ 0, 0, 1, 2, 3]

1993

c

NULL

1994

a

[ 0, 4]

1995

b

[ 0, 0, 0, NULL]

Calculate the multiset union on arr grouping on X:

X

aua

a

[ 0, 0, 1, 2, 4]

b

[ 0, 0, 0, 1, 2, 3]

c

[]

In this example the output array associated with X value a contains values 0, 1, 2, 4. This is because these values appear in the input arrays in the a group (years 1991 and 1994). Value 0 appears twice because it appears twice in at least one input array from the a group (year 1994). Similarly, 0 appears three times in the output array for the b group as it appears a maximal 3 times in the input arrays for b group (year 1995). As the only input array for c group is a NULL array, the output array for c group is an empty array. Furthermore, the NULL value inside the row from year 1995 is ignored and does not appear in the output array for b group.

Note that ordering within the output arrays is not guaranteed through multiset semantics. However, it is possible to achieve sorted arrays in the output by composing with array_sort: statsby aua:array_sort(array_union_agg(arr)), group_by(X)