fill¶
Type of operation: Aggregate
Description¶
Fills all the missing data in the query window or fills the missing data ahead of the last bucket by the given frame.
fill
can optionally specify a frame
argument with the ahead
parameter set and optionally bind values to columns of interest.
To be able to use the fill
verb over an input:
The input must be aligned over a time grid. A typical way to have the input aligned over a time grid is to apply a
timechart
oralign
before using this verb.The column selected to fill must not be one of the valid-from, valid-to, or grouping columns. These three types of columns will always be filled appropriately by the verb.
The value type must be coercible to the column type, which means a value of type string can’t be provided to a column that stores a type int64.
If no value has been specified for a column outside of the valid-from, valid-to, and grouping columns, it will be filled with a null value.
When a frame
is not specified, this verb will be non-accelerable and fill up all the empty buckets within the time grid with nulls or the values specified.
When a frame
is provided, the ahead
parameter must be greater or equal to the bucket size on the time grid. If the input has a bucket size of 5 minutes from align 5m, A_4XXError_sum:sum(m("4XXError"))
, the ahead
parameter must be set to 5 minutes or greater. This is such that at least one new time bucket is filled by this verb. The ahead
parameter within the frame
also has a upper bound of 72 hours. So, the lower bound for the verb is the bucket size on the time grid while the upper bound is 72 hours.
The number of buckets filled by this verb would always equal to the integer floor division between the frame and the bucket size. If the step size is 5 minutes and the frame is specified is 2 minutes, it will fill up 2 time buckets as 5 floor divisioned by 2 equals to 2.
One caveat is that if the input has a column called the _c_bucket from a timechart
or align
, the _c_bucket will be filled appropriately by this verb.
Usage¶
fill [ frame ], [ columnbinding_1, columnbinding_2, ... ]
Argument |
Type |
Optional |
Repeatable |
Restrictions |
---|---|---|---|---|
frame |
frame |
yes |
no |
constant |
columnbinding |
expression |
yes |
yes |
none |
Accelerable¶
fill is sometimes accelerable, depending on options used. A dataset that only uses accelerable verbs can be accelerated, making queries on the dataset respond faster.
Examples¶
align 1m,
memory_used: avg(m("container_memory_usage_bytes")),
memory_requested: avg(m("kube_pod_container_resource_requests_memory_bytes"))
aggregate
pod_memory_utilization: sum(memory_used) / sum(memory_requested),
group_by(cluster)
fill
From the input metric dataset, it applies the align
verb that aligns the metric points to a time grid of 1 minute. For each 1 minute time grid, it calculates the average of the “container_memory_usage_bytes” metric to the memory_used
column and the average of the “kube_pod_container_resource_requests_memory_bytes” metric to the memory_requested
column.
Afterwards, it applies the aggregate
verb to create the pod_memory_utilization
column that represents the memory utilization ratio for the pods, by dividing between the sum of memory_used
and the sum of memory_requested
grouped by each pod (i.e. cluster
, namespace
, podName
). aggregate
preserves the time grid, so the produced dataset still has a time grid of 1 minute.
From the aggregated metric aligned over a time grid, it fills the pod_memory_utilization
column with a null value, while the valid-from
, valid-to
, and cluster
columns are filled forward. If the cluster column only had the values “production” and “test”, every time bucket will have the Count column filled with a null value for those two groups.
All the missing time bucket within the query window will be filled as there’s no frame
argument provided.
The above OPAL is non-accelerable and can’t be published into a dataset. It only works under the context of a fixed query window.
timechart 1m, frame(back:10m), min_cpu_utilization: min(cpu_utilization), max_memory_usage: max(memory_usage), any_service: any(service), group_by(host)
fill min_cpu_utilization:0, max_memory_usage:100
From an arbitary input dataset, the timechart
verb evaluates the rows over the last 10 minutes every 1 minute. For every minute, it calculates the minimum value for column cpu_utilization
for min_cpu_utilization
, the maximum value for column memory_usage
for max_memory_usage
, and the any value for column service
for any_service
, for every host
.
From an aligned dataset over a time grid, it fills min_cpu_utilization
with 0 and max_memory_usage
with 100. Since no value has been specified for any_service
, it will be filled with nulls. Every time bucket over the time grid will have all the permutations of a host
.
All the missing time buckets within the query window will be filled as there’s no frame
argument provided.
The above OPAL is non-accelerable and can’t be published into a dataset. It only works under the context of a fixed query window.
align 1m,
memory_used: avg(m("container_memory_usage_bytes")),
memory_requested: avg(m("kube_pod_container_resource_requests_memory_bytes"))
aggregate
pod_memory_utilization: sum(memory_used) / sum(memory_requested),
group_by(cluster)
fill frame(ahead: 10m)
From the input metric dataset, it applies the align
verb that aligns the metric points to a time grid of 1 minute. For each 1 minute time grid, it calculates the average of the “container_memory_usage_bytes” metric to the memory_used
column and the average of the “kube_pod_container_resource_requests_memory_bytes” metric to the memory_requested
column.
Afterwards, it applies the aggregate
verb to create the pod_memory_utilization
column that represents the memory utilization ratio for the pods, by dividing between the sum of memory_used
and the sum of memory_requested
grouped by each pod (i.e. cluster
, namespace
, podName
). aggregate
preserves the time grid, so the produced dataset still has a time grid of 1 minute.
From the aggregated metric aligned over a time grid, it fills the pod_memory_utilization
column with a null value, while the valid-from
, valid-to
, and cluster
columns are filled appropriately. If the cluster column only had the values “production” and “test”, every time bucket will have the Count column filled with a null value for those two groups.
Only the 10 time buckets of the last appeared time bucket will be filled as 10 minutes floor divisioned by 1 minute gives 10 time buckets to fill.
The above OPAL is accelerable and can be published into a dataset. It also works under the context of a fixed query window.
timechart 10m, frame(back:10m), min_cpu_utilization: min(cpu_utilization), max_memory_usage: max(memory_usage), any_service: any(service), group_by(host)
fill frame(ahead:2h), min_cpu_utilization:0, max_memory_usage:100
From an arbitary input dataset, the timechart
verb evaluates the rows over the last 10 minutes every 1 minute. For every minute, it calculates the minimum value for column cpu_utilization
for min_cpu_utilization
, the maximum value for column memory_usage
for max_memory_usage
, and the any value for column service
for any_service
, for every host
.
From an aligned dataset over a time grid, it fills min_cpu_utilization
with 0 and max_memory_usage
with 100. Since no value has been specified for any_service
, it will be filled with nulls. Every time bucket over the time grid will have all the permutations of the grouping column host
.
Only the 12 time buckets of the last appeared time bucket will be filled as 120 minutes floor divisioned by 10 minute gives 12 time buckets to fill.
The above OPAL is accelerable and can be published into a dataset. It also works under the context of a fixed query window.