Introduction to Metrics¶
A metric is any sort of value you can measure over time. It could be blocks used on a filesystem, the number of nodes in a cluster, or a temperature reading. They are reported in the form of a time series: a set of values in time order. Each point in a time series represents a measurement from a single resource, with its name, value, and tags.
Observe links metrics to Resource Sets, so you can view relevant metrics on a Resource Landing Page.

This page describes the process of shaping raw metrics data for Resources. There are several considerations and decisions to make in the modeling process. Please contact us if you have questions about modeling your specific data.
Note
Metrics use OPAL in a worksheet to transform the raw data, add metadata, and create relationships between datasets. If you are not familiar with OPAL, please see OPAL — Observe Processing and Analysis Language
What Is a Metric Dataset?¶
An Observe metric dataset contains both metrics data and metadata that provides additional context. Observe uses two different forms for metrics, called narrow and wide.
Narrow Metrics¶
Narrow metrics contain one metric per row: a single data point containing a timestamp, name, value, and zero or more tags. For example, the following table contains values for two metrics in narrow form:
valid_from |
metric_name |
metric_value |
metric_tags |
---|---|---|---|
00:00:00 |
disk_used_bytes |
20000000 |
{“device”:”sda1”} |
00:00:00 |
disk_total_bytes |
50000000 |
{“device”:”sda1”} |
00:01:00 |
disk_used_bytes |
10000000 |
{“device”:”sda1”} |
00:01:00 |
disk_total_bytes |
50000000 |
{“device”:”sda1”} |
00:02:00 |
disk_used_bytes |
40000000 |
{“device”:”sda1”} |
00:02:00 |
disk_total_bytes |
50000000 |
{“device”:”sda1”} |
Some systems generate this by default, or you can shape other data into the correct form with OPAL.
Note
Metric values must be float64
. If you need to convert from another type, see the
float64 function.
Narrow metrics are easier to manage at ingest time and as events. With one metric per row, it is
clear which value and tags belong to what metric. The
interface
verb registers
a dataset as a metric dataset, and
set_metric
specifies the
details of the individual metrics it contains.
Wide Metrics¶
Wide metrics contain several, often related, metrics. This form is easier for calculations, such
as percent usage, because the needed values can be available in the same row. Wide metrics are
created by the rollup
and
aggregate
verbs.
rollup
defines how each narrow metric should be aggregated over time, and aggregate
determines how wide metrics from different sources are aggregated by tags.
The table below is a wide format rollup based on the example above. It includes valid_from
and valid_to
timestamps indicate the time period over which the average is calculated.
valid_from |
valid_to |
disk_used_bytes_avg |
disk_total_bytes_avg |
metric_tags |
---|---|---|---|---|
00:00:00 |
00:01:00 |
15000000 |
50000000 |
{“device”:”sda1”} |
00:01:00 |
00:02:00 |
25000000 |
50000000 |
{“device”:”sda1”} |
Create an Interface¶
interface
maps fields to a metric interface so subsequent operations know which fields contain
the the metric names and values. This metadata-only operation prepares a dataset for use as metrics.
Example:
interface "metric", metric:metricNameColumn, value:metricValueColumn
Registering, or “implementing the metric interface,” establishes the following conditions:
This dataset contains narrow metrics
Each row represents one point in a time series
The
metricNameColumn
column contains the metric namesThe
metricValueColumn
column contains the metric values
Define Individual Metrics¶
Once the dataset is set up for metrics, use
set_metric
to define
the metadata for each metric. If you have many metrics, you can register some and add others
later by updating the Event Stream definition.
Example:
set_metric options(label:"Ingress Bytes", type:"cumulativeCounter", unit:"bytes", description:"Ingress reported from somewhere", rollup:"rate", aggregate:"sum"), "ingress_bytes"
This statement registers the metric ingress_bytes
as a cumulativeCounter
, which is aggregated
over time as a rate, and across multiple tags as a sum. For more about allowed values for the
rollup
and aggregate
options, please see the OPAL verb documentation for
set_metric
and the example
walkthrough below.
Note
set_metric
units use standard SI unit names from the
math.js library, with the exceptions
noted below. They may be combined for compound units like rates and ratios. Other units
may not scale appropriately in charts, please contact us if you have trouble with an unusual
or custom unit.
You may use either the unit names or abbreviations, and most names can be either singular
(hour
) or plural (hours
.) Please see the math.js docs for details.
We recommend using full names for clarity. Note that both names and abbreviations are
case-sensitive. For a unitless measurement, either omit unit:
or use unit:""
.
Examples of data units:
Name |
Abbreviation |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Exceptions:
m
is minutes, usemeter
for lengthC
is degrees celsius, usecoulomb
for electric chargeF
is degrees fahrenheit, usefarad
for capacitance
Units based on B
scale by a factor of 1000 in board cards: the metric value displays with larger units as its value increases. For example, 1,000 B
bytes is 1 kB
.
To scale by 1024, use By
units: By
, KiB
, MiB
, GiB
, or TiB
.
Link Metrics to Resources¶
To show metrics on a Resource Landing Page, link the metric dataset to the Resource Set’s primary key.
From the metric dataset worksheet, select Link To Resource Set from the column heading menu for the same key. Save the updated Event Stream definition to link the two datasets. Reload the Resource Landing Page to see the new metrics.
Walkthrough: Putting It All Together¶
To show how this works, here is an example of creating metrics from process data. We have a shell
script that sends data from ps
to Observe every five seconds. Before it’s converted to JSON,
the original output looks like this:
PID RSS TIME %CPU COMMAND
1 12752 1 2.0 systemd
2 0 0 0.0 kthreadd
3 0 0 0.0 rcu_gp
ps
reports several pieces of information for each process, so the first step is to shape the
data into narrow form with OPAL.
Open a new worksheet based on the Firehose, also called the Observation Event Stream. Then filter to the desired observations and extract needed fields:
// The script used a unique path for HTTP ingestion // Filter on it to get the desired data filter OBSERVATION_KIND="http" and string(EXTRA.path)="/metricquickstart" // Flatten_Leaves creates a new row for each set of process data, // corresponding to one row in the original output // Creates _c_FIELDS_stdout_value containing each string // and _c_FIELDS_stdout_path for its position in the JSON object (which we don't need.) flatten_leaves FIELDS.stdout // Select the field that contains the data we want pick_col BUNDLE_TIMESTAMP, ps:string(_c_FIELDS_stdout_value) // Extract fields from the ps string output with a regex extract_regex ps, /^\s+(?P<pid>\d+)\s+(?P<rss>\d+)\s+(?P<cputimes>\d+)\s+(?P<pcpu>\d+.\d+)\s+(?P<command>\S+)\s*$/
The reformatted data now looks like this:
BUNDLE_TIMESTAMP
ps
command
pcpu
cputimes
rss
pid
02/24/21 16:14:03.151
1 12752 1 2.0 systemd
systemd
2.0
1
12752
1
02/24/21 16:14:03.151
2 0 0 0.0 kthreadd
kthreadd
0.0
0
0
2
02/24/21 16:14:03.151
3 0 0 0.0 rcu_gp
rcu_gp
0.0
0
0
3
Note
If your desired data is already part of an existing Resource Set, start from there instead of beginning with Observation. See Performance for more.
Shape into narrow metrics:
// Create a new object containing the desired values, // along with more verbose metric names make_col metrics:makeobject("resident_set_size":rss, "cumulative_cpu_time":cputimes, "cpu_utilization":pcpu) // Flatten that metrics object to create one row for each value flatten_leaves metrics // Select the desired fields, renaming in the process // Also convert value to float64, as currently required for metric values pick_col valid_from:BUNDLE_TIMESTAMP, pid, command, metric_name:string(_c_metrics_path), metric_value:float64(_c_metrics_value)
After shaping, it looks like this:
valid_from
pid
command
metric_name
metric_value
02/24/21 16:14:03.151
1
systemd
cpu_utilization
2.0
02/24/21 16:14:03.151
1
systemd
resident_set_size
12752
02/24/21 16:14:03.151
1
systemd
cumulative_cpu_time
1
02/24/21 16:14:03.151
2
kthreadd
cpu_utilization
0.0
02/24/21 16:14:03.151
2
kthreadd
resident_set_size
0
02/24/21 16:14:03.151
2
kthreadd
cumulative_cpu_time
0
02/24/21 16:14:03.151
3
rcu_gp
cpu_utilization
0.0
02/24/21 16:14:03.151
3
rcu_gp
resident_set_size
0
02/24/21 16:14:03.151
3
rcu_gp
cumulative_cpu_time
0
Register an interface to identify this dataset as containing metrics data.
// This interface statement specifies that the names of our metrics are in // metric_name, and their values in metric_value interface "metric", metric:metric_name, value:metric_value
The
interface
verb adds metadata, so there’s no visible effect on the data yet. Themetric
keyword indicates that we want a metrics interface.This operation defines several important pieces of information about this dataset. Some are directly specified, and some are inferred from the dataset’s definition, or schema.
This is a narrow metric dataset, where each row represents one metric point
The values in
metric_name
are the metric namesThe values in
metric_value
are the metric valuesThe values in
valid_from
are the time of the observationThe other fields (
pid
andcommand
) are tags, used later for linking to a Resource Set
Define individual metrics
Now we have a metrics-ready dataset. It contains raw metrics data, and we have told Observe which fields contain the names and values. To use it, we need additional metadata about the individual values. Create this for each metric using
set_metric
.// RSS is a gauge, a measurement at a point in time // rollup type "avg" means when a metric's value is tracked over time, we want the average // aggregate "sum" means when these values are tracked across multiple processes, we want the total sum // The name of this metric is resident_set_size, linking it to identically named values in the metric_name field set_metric options(label:"Memory Usage: RSS", unit:"kilobytes", description:"Resident set size of the process", type:"gauge", rollup:"avg", aggregate:"sum"), "resident_set_size" // Cumulative CPU Time is a cumulativeCounter, a monotonically increasing total // rollup "rate" gives the rate at which a particular metric's value increases over time set_metric options(label:"Cumulative CPU Time", unit:"s", description:"The cumulative CPU time spent by the process", type:"cumulativeCounter", rollup:"rate", aggregate:"sum"), "cumulative_cpu_time" // CPU Utilization is also a gauge measurement, with rollup "avg" and aggregate "sum" // This measurement is unitless, so unit: is omitted set_metric options(label:"CPU Utilization", description:"CPU utilization of the process, expressed as a percentage", type:"gauge", rollup:"avg", aggregate:"sum"), "cpu_utilization"
This defines what we want to track and how to treat it in subsequent rollup and aggregation operations. There is also another metric type,
delta
, the change from the previous measurement.Create a new dataset by publishing this worksheet as a new Event Stream.
Link the metrics dataset to a related Resource Set
To view metrics on a Resource Landing Page, first we need a Resource Set. Start from the Event Stream we just created, and open it as a worksheet. The
pid
andcommand
fields contain additional tags for the metric data in themetric_name
andmetric_value
fields we created earlier.Select these two fields (cmd-click or ctrl-click on the column headers), right click to open the menu, and choose Create New Resource Set. Check the
pid
andcommand
fields, and then specifypid
as the primary key. This allows us to link the new Resource Set to the metric dataset. Click Create to save.Now you have a second stage in your worksheet, for the
pid
Resource Set. Click Publish New Resource Set in the right rail to make it available as a Resource Set.pid
isn’t that descriptive of a name, so call it Process and click Publish to save.Now we need to tell the metrics dataset about the Resource Set’s primary key. Open a new tab and edit the walkthrough-metrics-quickstart Event Stream definition. Select the
pid
field and choose Link To Resource Set from the menu and then Process in the sub-menu.Click Apply, and then Save to save changes to the Event Stream definition.
View metrics on the Resource Landing Page
Open the Process Landing Page in a new tab to see the metrics in new cards.