pivot

pivot [options(...)]?, name: col storable, value: col storable, [name_value: expression]+, [group_by(...)]?

The pivot verb rotates a table by transforming rows into columns.

This verb can be used to transform a narrow table (e.g. timestamp, year, product, sales) into a wider table (e.g. timestamp, year, laptop_sales, tablet_sales, smartphone_sales).

The arguments are as follows:

options(align:x): Optional argument to specify alignment, which denotes the period for aggregating nearby data points together onto a time grid
name: Expression used to select rows for generating the name_value rows
value: Expression used to populate the name_value columns
name_value_0, ..., name_value_n: Columns to generate. These arguments can be specified either using the name:expression syntax or by just providing an expression. If name:expression syntax is used, the name of the generated column is set to name. If only expression is passed, pivot will try to auto-generate a name. The rows of each name_value column is set to:
- Any non-null result of evaluating the value argument, if the value of name argument equals name_value and no alignment is specified
- Last non-null result of evaluating the value argument, if the value of name argument equals name_value and an alignment is specified using options(align:x)
- Null, otherwise
group_by: Optional grouping. Default grouping is all columns not referenced in the preceding arguments. When no alignment is specified, there is also an implicit grouping on the timestamp column, similar to how the timestats verb treats timestamp columns.

Additional details:

None of the arguments can contain an aggregation function
pivot is not supported for interval and resource datasets
order_by specification is not allowed

Options

Option	Type	Meaning
align	duration	Specifies the period for aggregating nearby data points together onto a time grid

Accelerable

pivot is sometimes accelerable, depending on options used. A dataset that only uses accelerable verbs can be accelerated, making
queries on the dataset respond faster.

Examples

Consider the following input table A:

year	product	sales
2021	"laptop"	100
2021	"tablet"	200
2022	"laptop"	150
2022	"tablet"	250
2022	"smartphone"	300
2023	"smartphone"	350

The above statement transforms table A to:

timestamp	year	laptop_sales	tablet_sales	smartphone_sales
10/31/2023 13:51:27	2021	100	200	NULL
10/31/2023 15:57:03	2022	150	250	300
10/31/2023 15:60:12	2023	NULL	NULL	NULL

The above statement transforms table A to:

year	laptop	tablet	smartphone
2021	100	200	NULL
2022	150	250	300
2023	NULL	NULL	NULL

pivot in this case auto-generates the column names.

The above statement transforms table A to:

year	laptop	tablet	smartphone
2021	100	200	NULL
2022	150	250	300
2023	NULL	NULL	NULL

The group_by optional and in this case is inferred to be group_by(year).

The above statement transforms table A to:

year	laptop_item	tablet_item	smartphone_item
2021	200	400	NULL
2022	300	500	600
2023	NULL	NULL	NULL

pivot accepts any scalar expression not containing an aggregate function as an argument.

Consider the following input table B:

year	product	sales
2021	"laptop"	100
2021	"laptop"	500
2021	"tablet"	200
2022	"laptop"	150
2022	"tablet"	250
2022	"tablet"	375
2022	"smartphone"	300
2023	"smartphone"	350

The result of the above statement for this input is non-deterministic because two groups (namely "laptop" for the year 2021 and "tablet" for the year 2022) contains multiple non-null value results. In such cases, pivot will pick any non-null result.

One possible output is:

year	laptop_sales	tablet_sales	smartphone_sales
2021	100	200	NULL
2022	150	250	300
2023	NULL	NULL	NULL

Again, consider another example, input table C:

timestamp	year	product	sales
10/31/2023 13:51:27	2021	"laptop"	100
10/31/2023 13:52:29	2021	"tablet"	200
10/31/2023 15:57:03	2022	"laptop"	150
10/31/2023 15:58:13	2022	"tablet"	250
10/31/2023 15:59:35	2022	"smartphone"	300
10/31/2023 16:00:12	2023	"smartphone"	350

The result of the above statement for this input, however, is not non-deterministic, despite the timestamp column having conflicting values per group. This is because the timestamp column is treated specially: it is implicitly used as a grouping column, similar to how the timestats verb treats timestamp columns.

The output is:

timestamp	year	laptop_sales	tablet_sales	smartphone_sales
10/31/2023 13:51:27	2021	100	NULL	NULL
10/31/2023 13:52:29	2021	NULL	200	NULL
10/31/2023 15:57:03	2022	150	NULL	NULL
10/31/2023 15:58:13	2022	NULL	250	NULL
10/31/2023 15:59:35	2022	NULL	NULL	300
10/31/2023 16:00:12	2023	NULL	NULL	350

For inputs with conflicting timestamps, it is recommended that you use the options(align:x) argument, as demonstrated in the next example.

The above statement, for the window 10/31/2023 15:00:00 to 10/31/2023 17:00:00, transforms table C to:

_c_bucket	_c_valid_from	_c_valid_to	year	laptop	tablet	smartphone
9437697	10/31/2023 13:51:00	10/31/2023 13:54:00	2021	100	200	NULL
9437739	10/31/2023 15:57:00	10/31/2023 16:00:00	2022	150	250	300
9437740	10/31/2023 16:00:00	10/31/2023 16:03:00	2023	NULL	NULL	350

Updated 8 months ago

Did this page help you?

Options

Categories

Accelerable

Examples