fill¶

Type of operation: Aggregate

Description¶

Fills all the missing data in the query window or fills the missing data ahead of the last bucket by the given frame.

fill can optionally specify a frame argument with the ahead parameter set and optionally bind values to columns of interest.

To be able to use the fill verb over an input:

The input must be aligned over a time grid. A typical way to have the input aligned over a time grid is to apply a timechart or align before using this verb.
The column selected to fill must not be one of the valid-from, valid-to, or grouping columns. These three types of columns will always be filled appropriately by the verb.

The value type must be coercible to the column type, which means a value of type string can’t be provided to a column that stores a type int64.

If no value has been specified for a column outside of the valid-from, valid-to, and grouping columns, it will be filled with a null value.

If this verb is used without binding a value to any column outside of the valid-from, valid-to, and grouping keys, an error will be thrown. However, if there are indeed no bind-able columns (meaning the input schema only consists of the valid-from, valid-to, and grouping keys columns), the error will not be raised.

When a frame is not specified, this verb will be non-accelerable and fill up all the empty buckets within the time grid with nulls or the values specified.

When a frame is provided, the ahead parameter must be greater or equal to the bucket size on the time grid. If the input has a bucket size of 5 minutes from align 5m, A_4XXError_sum:sum(m("4XXError")), the ahead parameter must be set to 5 minutes or greater. This is such that at least one new time bucket is filled by this verb. The ahead parameter within the frame also has a upper bound of 72 hours. So, the lower bound for the verb is the bucket size on the time grid while the upper bound is 72 hours.

The number of buckets filled by this verb would always equal to the integer floor division between the frame and the bucket size. If the step size is 5 minutes and the frame is specified is 2 minutes, it will fill up 2 time buckets as 5 floor divisioned by 2 equals to 2.

One caveat is that if the input has a column called the _c_bucket from a timechart or align, the _c_bucket will be filled appropriately by this verb.

Usage¶

fill [ frame ], [ columnbinding_1, columnbinding_2, ... ]

Argument	Type	Optional	Repeatable	Restrictions
frame	frame	yes	no	constant
columnbinding	expression	yes	yes	none

Accelerable¶

fill is sometimes accelerable, depending on options used. A dataset that only uses accelerable verbs can be accelerated, making queries on the dataset respond faster.

Examples¶

align 1m,
  memory_used: avg(m("container_memory_usage_bytes")),
  memory_requested: avg(m("kube_pod_container_resource_requests_memory_bytes"))
aggregate
  pod_memory_utilization: sum(memory_used) / sum(memory_requested),
  group_by(cluster)
fill pod_memory_utilization: float64_null()

From the input metric dataset, it applies the align verb that aligns the metric points to a time grid of 1 minute. For each 1 minute time grid, it calculates the average of the “container_memory_usage_bytes” metric to the memory_used column and the average of the “kube_pod_container_resource_requests_memory_bytes” metric to the memory_requested column.

Afterwards, it applies the aggregate verb to create the pod_memory_utilization column that represents the memory utilization ratio for the pods, by dividing between the sum of memory_used and the sum of memory_requested grouped by each pod (i.e. cluster, namespace, podName). aggregate preserves the time grid, so the produced dataset still has a time grid of 1 minute.

From the aggregated metric aligned over a time grid, it fills the pod_memory_utilization column with a null value as a result of the float64_null() function, while the valid-from, valid-to, and cluster columns are filled forward. If the cluster column only had the values “production” and “test”, every time bucket will have the pod_memory_utilization column filled with a null value for those two groups.

All the missing time bucket within the query window will be filled as there’s no frame argument provided.

The above OPAL is non-accelerable and can’t be published into a dataset. It only works under the context of a fixed query window.

timechart 1m, frame(back:10m), min_cpu_utilization: min(cpu_utilization), max_memory_usage: max(memory_usage), any_service: any(service), group_by(host)
fill min_cpu_utilization:0, max_memory_usage:100

From an arbitary input dataset, the timechart verb evaluates the rows over the last 10 minutes every 1 minute. For every minute, it calculates the minimum value for column cpu_utilization for min_cpu_utilization, the maximum value for column memory_usage for max_memory_usage, and the any value for column service for any_service, for every host.

From an aligned dataset over a time grid, it fills min_cpu_utilization with 0 and max_memory_usage with 100. Since no value has been specified for any_service, it will be filled with nulls. Every time bucket over the time grid will have all the permutations of a host.

All the missing time buckets within the query window will be filled as there’s no frame argument provided.

The above OPAL is non-accelerable and can’t be published into a dataset. It only works under the context of a fixed query window.

align 1m,
  memory_used: avg(m("container_memory_usage_bytes")),
  memory_requested: avg(m("kube_pod_container_resource_requests_memory_bytes"))
aggregate
  pod_memory_utilization: sum(memory_used) / sum(memory_requested),
  group_by(cluster)
fill frame(ahead: 10m)

From the aggregated metric aligned over a time grid, it fills the pod_memory_utilization column with a null value, while the valid-from, valid-to, and cluster columns are filled appropriately. If the cluster column only had the values “production” and “test”, every time bucket will have the Count column filled with a null value for those two groups.

Only the 10 time buckets of the last appeared time bucket will be filled as 10 minutes floor divisioned by 1 minute gives 10 time buckets to fill.

The above OPAL is accelerable and can be published into a dataset. It also works under the context of a fixed query window.

timechart 10m, frame(back:10m), min_cpu_utilization: min(cpu_utilization), max_memory_usage: max(memory_usage), any_service: any(service), group_by(host)
fill frame(ahead:2h), min_cpu_utilization:0, max_memory_usage:100

Only the 12 time buckets of the last appeared time bucket will be filled as 120 minutes floor divisioned by 10 minute gives 12 time buckets to fill.

The above OPAL is accelerable and can be published into a dataset. It also works under the context of a fixed query window.

timechart 10m, frame(back:10m), min_cpu_utilization: min(cpu_utilization), max_memory_usage: max(memory_usage), any_service: any(service), group_by(host)
fill

However, in this example, a fill has been provided without any column to bind a value to from one of min_cpu_utilization and max_memory_usage columns. As a result, an error will be raised here that informs “please provide at least one column from [min_cpu_utilization, max_memory_usage] to fill by some value”.