prom_quantile

Description

Calculates an approximate percentile value of the distribution in a histogram metric generated by a Prometheus data source.

This function is generally used in aggregate after previously time-aligning the metric with rate. Aggregation across key dimensions will be done with sum internally, and should not be separately specified.

The metric must have a name ending in _bucket and there must be a tag named le with the bucket boundaries because of how Prometheus histograms are generated. Additionally, the Grafana collection agent must be configured to not drop these metrics, which is commonly a default because the size of each Prometheus histogram is much bigger than a regular counter.

An additional restriction is that the Prometheus quantile function must be at the top level of an expression (i e, the output column value). To do further operations on this value, put it into a column, and do the additional calculations in a subsequent step.

Return type

float64

Domain

This is an aggregate function (aggregates rows over a group in aggregate verbs.)

Categories

Usage

prom_quantile(prom_bucket, quantile, [ le_val ])

Argument

Type

Optional

Repeatable

Restrictions

prom_bucket

numeric

no

no

none

quantile

numeric

no

no

constant

le_val

float64

yes

no

none

Examples

align 5m, rate(m("request_latency_bucket"))
aggregate
  p95:prom_quantile(request_latency_bucket, 0.95, tags.le),
  group_by(tags.service)

The request latency histogram is aligned to 5 minute buckets, and rate is calculated to get the rate of samples (because the histogram is cumulative). The 95th percentile is then estimated using prom_quantile() across the service tag, based on the le bucket tags generated by the histogram source.

align 5m, rate(m("request_latency_bucket")), rate(m("request_throughput_bucket"))
aggregate
  p95_lat:prom_quantile(request_latency_bucket, 0.95),
  p95_thru:prom_quantile(request_throughput_bucket, 0.95),
  group_by(tags.service)
make_col bandwidth95_latency95_product:p95_lat * p95_thru

Use the “le” tag in the “label” object column (or another suitably named column), estimate the p95 of the request latency and request throughput based on the pre-defined Prometheus histogram buckets. Then calculate the “bandwidth delay product” of these two estimates. This calculation must be in its own subsequent operation.