Introduction to Metrics

A metric is any sort of value you can measure over time. It could be blocks used on a filesystem, the number of nodes in a cluster, or a temperature reading. They are reported in the form of a time series: a set of values in time order. Each point in a time series represents a measurement from a single resource, with its name, value, and tags.

Pod memory usage metrics on the Pod dashboard

Figure 1 - Pod memory usage metrics on the Pod dashboard

This topic describes the process of shaping raw metrics data. There are several considerations and decisions to make in the modeling process, contact us if you have questions about modeling your specific data.

Note

Metrics use OPAL in a worksheet to transform the raw data, add metadata, and create relationships between datasets. If you are not familiar with OPAL, please see OPAL — Observe Processing and Analysis Language

Metrics Video Tour

Figure 2 - Video tour of Observe Metrics Landing page

What Is a Metric Dataset?

An Observe metric dataset contains both metrics data and metadata that provides additional context. Observe uses two different forms for metrics, called narrow and wide.

Narrow Metrics

Narrow metrics contain one metric per row: a single data point containing a timestamp, name, value, and zero or more tags. For example, the following table contains values for two metrics in narrow form:

valid_from

metric

value

tags

00:00:00

disk_used_bytes

20000000

{“device”:”sda1”}

00:00:00

disk_total_bytes

50000000

{“device”:”sda1”}

00:01:00

disk_used_bytes

10000000

{“device”:”sda1”}

00:01:00

disk_total_bytes

50000000

{“device”:”sda1”}

00:02:00

disk_used_bytes

40000000

{“device”:”sda1”}

00:02:00

disk_total_bytes

50000000

{“device”:”sda1”}

Some systems generate this by default, or you can shape other data into the correct form with OPAL.

Note

Metric values must be float64. If you need to convert from another type, see the float64() function.

Narrow metrics are easier to manage at ingest time and as events. With one metric per row, it is clear which value and tags belong to what metric. The interface verb registers a narrow metric dataset as containing metrics.

Wide Metrics

Wide metrics contain several, often related, metrics. This form is easier for calculations, such as percent usage, because the needed values can be available in the same row. This page mainly covers narrow metrics, but it’s good to be aware of the differences between the two forms.

Some log files already contain a type of wide metric format, such as key=value pairs emitted by a process. They may also be created in Observe by the rollup and aggregate verbs. rollup defines how each narrow metric should be aggregated over time, and aggregate determines how wide metrics from different sources are aggregated by tags.

The table below is a wide format rollup based on the example above. It includes valid_from and valid_to timestamps indicate the time period over which the average is calculated.

valid_from

valid_to

disk_used_bytes_avg

disk_total_bytes_avg

tags

00:00:00

00:01:00

15000000

50000000

{“device”:”sda1”}

00:01:00

00:02:00

25000000

50000000

{“device”:”sda1”}

Metric interfaces

The interface verb maps fields to a metric interface so subsequent operations know which fields contain the the metric names and values. This metadata-only operation prepares a dataset for use as metrics.

Example:

interface "metric"

The data you see doesn’t change, but registering, or “implementing the metric interface,” establishes the following conditions:

  • This dataset contains narrow metrics

  • Each row represents one point in a time series

  • A field named metric contains the metric names

  • A field named value contains the metric values

If the metric names and values are already in fields called metric and value, interface discovers them automatically. See the docs for interface for more about fields with nonstandard names.)

Walkthrough: Putting It All Together

To show how this works, here is an example of creating metrics from process data. Since most of this shaping is done with OPAL, we’ll focus on verbs and functions rather than UI actions.

We have a shell script that sends data from ps to Observe every five seconds, sent to a data stream called “metrics-test.” Before it’s converted to JSON, the original ps output looks like this:

PID   RSS     TIME %CPU COMMAND
  1 12752        1  2.0 systemd
  2     0        0  0.0 kthreadd
  3     0        0  0.0 rcu_gp

Field

Description

PID

Process ID

RSS

Resident set size (memory used, kb)

TIME

Accumulated CPU time

%CPU

Percent CPU utilization

COMMAND

Process name

As it’s ingested, Observe adds a timestamp, an ingest type, and metadata about the data stream. In this example, the process data is in FIELDS, as a JSON object:

The metrics-test event dataset, opened in a new worksheet. Fields shown in the data table include BUNDLE_TIMESTAMP, FIELDS, and EXTRA. The Inspect tab of the OPAL console shows part of the value for the highlighted FIELDS row. It is a large JSON object containing the process ID, Resident Set Size, and name for each process sampled.

Figure 3 - Process data in a worksheet.

The first step in converting this to metrics is to shape the data into narrow form using OPAL.

  1. Open a new worksheet for the existing metrics-test event dataset.

  2. In the OPAL console, extract the needed fields with flatten_leaves, pick_col, and extract_regex.

    // Flatten_Leaves creates a new row for each set of process data,
    // corresponding to one row in the original output
    // Creates _c_FIELDS_stdout_value containing each string
    // and _c_FIELDS_stdout_path for its position in the JSON object (which we don't need.)
    flatten_leaves FIELDS.stdout
    
    // Select the field that contains the data we want. Rename it too.
    // pick_col must include a timestamp, even if we aren't explicitly using it
    pick_col BUNDLE_TIMESTAMP, ps:string(_c_FIELDS_stdout_value)
    
    // Extract fields from the ps string output with a regex
    extract_regex ps, /^\s+(?P<pid>\d+)\s+(?P<rss>\d+)\s+(?P<cputimes>\d+)\s+(?P<pcpu>\d+.\d+)\s+(?P<command>\S+)\s*$/
    

    The reformatted data now looks like this:

    BUNDLE_TIMESTAMP

    ps

    command

    pcpu

    cputimes

    rss

    pid

    02/24/21 16:14:03.151

    1 12752 1 2.0 systemd

    systemd

    2.0

    1

    12752

    1

    02/24/21 16:14:03.151

    2 0 0 0.0 kthreadd

    kthreadd

    0.0

    0

    0

    2

    02/24/21 16:14:03.151

    3 0 0 0.0 rcu_gp

    rcu_gp

    0.0

    0

    0

    3

    Note that you could also extract fields with a regex from the UI, by selecting Extract from text from the column menu and using the Custom regular expression method. Although the other steps still require writing OPAL statements.

  3. Shape into narrow metrics:

    // Create a new object containing the desired values,
    // along with more verbose metric names
    make_col metrics:make_object("resident_set_size":rss, "cumulative_cpu_time":cputimes, "cpu_utilization":pcpu)
    
    // Flatten that metrics object to create one row for each value
    flatten_leaves metrics
    
    // Select the desired fields, renaming in the process
    // Also convert value to float64, necessary for metric values
    pick_col valid_from:BUNDLE_TIMESTAMP,
      pid, command,
      metric:string(_c_metrics_path), value:float64(_c_metrics_value)
    

    After shaping, it looks like this:

    valid_from

    pid

    command

    metric

    value

    02/24/21 16:14:03.151

    1

    systemd

    cpu_utilization

    2.0

    02/24/21 16:14:03.151

    1

    systemd

    resident_set_size

    12752

    02/24/21 16:14:03.151

    1

    systemd

    cumulative_cpu_time

    1

    02/24/21 16:14:03.151

    2

    kthreadd

    cpu_utilization

    0.0

    02/24/21 16:14:03.151

    2

    kthreadd

    resident_set_size

    0

    02/24/21 16:14:03.151

    2

    kthreadd

    cumulative_cpu_time

    0

    02/24/21 16:14:03.151

    3

    rcu_gp

    cpu_utilization

    0.0

    02/24/21 16:14:03.151

    3

    rcu_gp

    resident_set_size

    0

    02/24/21 16:14:03.151

    3

    rcu_gp

    cumulative_cpu_time

    0

  4. Register an interface to identify this dataset as containing metrics data:

    // Metric names are in field "metric", values in "value"
    interface "metric"
    

    An interface "metric" statement tells Observe several important things about a dataset:

    • This is a narrow metric dataset, where each row represents one metric point

    • The values in field metric are the metric names, such as cpu_utilization

    • The values in field value are the metric values, such as 2.0

    • The values in valid_from are the time of the observation

    • The other fields (pid and command) are tags that provide additional context

  5. Save the shaped data as a new dataset

    With the data shaping work done, save the results by publishing a new event stream. This creates a new dataset containing the metric events, allowing them to be used by other datasets and worksheets.

    In the right rail, click Publish New Event Stream and give the new dataset a name. For this example, we’ve named it “process/linux-process-metrics” to create the dataset in a new “process” package. Click Publish to save.

    Right rail with the Publish Event Stream dialog open.

    Figure 4 - The Publish Event Stream dialog.

  6. View metrics in Observe

    Now that we have identified this dataset as containing metrics, Observe discovers the individual metrics without any further shaping. This process takes a few minutes, after which you can find the new metrics in the Metrics tab of the Explore page.

    Search for package “process” to view only the metrics for this package. Click a metric to see its details:

    The Explore page, Metrics tab, viewing the resident_set_size metric. The summary card on the left shows the name, type, the dataset this metric belongs to, and the description "Auto Detected Metric." On the right is additional information about this metric, including a chart of values.

    Figure 5 - The Explore page Metrics tab.

Advanced metric shaping

If auto detected metrics don’t correctly handle your data, you may also explicitly define the metrics of interest with the set_metric verb.

Example:

set_metric options(label:"Ingress Bytes", type:"cumulativeCounter", unit:"bytes", description:"Ingress reported from somewhere", rollup:"rate", aggregate:"sum"), "ingress_bytes"

This statement registers the metric ingress_bytes as a cumulativeCounter, which is aggregated over time as a rate, and across multiple tags as a sum. For more about allowed values for the rollup and aggregate options, please see the OPAL verb documentation for set_metric.

Note

set_metric units use standard SI unit names from the math.js library, with the exceptions noted below. They may be combined for compound units like rates and ratios. Other units may not scale appropriately in charts, please contact us if you have trouble with an unusual or custom unit.

You may use either the unit names or abbreviations, and most names can be either singular (hour) or plural (hours.) Please see the math.js docs for details.

We recommend using full names for clarity. Note that both names and abbreviations are case-sensitive. If unit: is omitted, the metric is unitless. You may also use unit:"" to indicate unitless values.

Examples of data units:

Name

Abbreviation

bits

b

bytes

B

kilobytes

kB

gigabytes

GB

terabytes

TB

bytes/second

B/s

megabits/second

Mb/s

Exceptions:

  • m is minutes, use meter for length

  • C is degrees celsius, use coulomb for electric charge

  • F is degrees fahrenheit, use farad for capacitance

Units based on B scale by a factor of 1000 in board cards: the metric value displays with larger units as its value increases. For example, 1,000 B bytes is 1 kB.

To scale by 1024, use By units: By, KiB, MiB, GiB, or TiB.