Introduction to Metrics

A metric is any sort of value you can measure over time. It could be blocks used on a filesystem, the number of nodes in a cluster, or a temperature reading. They are reported in the form of a time series: a set of values in time order. Each point in a time series represents a measurement from a single resource, with its name, value, and tags.

Observe links metrics to Resource Sets, so you can view relevant metrics on a Resource Landing Page.

Container metrics on the Container Resource Landing page.

This page describes the process of shaping raw metrics data for Resources. There are several considerations and decisions to make in the modeling process. Please contact us if you have questions about modeling your specific data.

Note

Metrics use OPAL in a worksheet to transform the raw data, add metadata, and create relationships between datasets. If you are not familiar with OPAL, please see OPAL — Observe Processing and Analysis Language

What Is a Metric Dataset?

An Observe metric dataset contains both metrics data and metadata that provides additional context. Observe uses two different forms for metrics, called narrow and wide.

Narrow Metrics

Narrow metrics contain one metric per row: a single data point containing a timestamp, name, value, and zero or more tags. For example, the following table contains values for two metrics in narrow form:

valid_from

metric_name

metric_value

metric_tags

00:00:00

disk_used_bytes

20000000

{“device”:”sda1”}

00:00:00

disk_total_bytes

50000000

{“device”:”sda1”}

00:01:00

disk_used_bytes

10000000

{“device”:”sda1”}

00:01:00

disk_total_bytes

50000000

{“device”:”sda1”}

00:02:00

disk_used_bytes

40000000

{“device”:”sda1”}

00:02:00

disk_total_bytes

50000000

{“device”:”sda1”}

Some systems generate this by default, or you can shape other data into the correct form with OPAL.

Note

Metric values must be float64. If you need to convert from another type, see the float64 function.

Narrow metrics are easier to manage at ingest time and as events. With one metric per row, it is clear which value and tags belong to what metric. The interface verb registers a dataset as a metric dataset, and set_metric specifies the details of the individual metrics it contains.

Wide Metrics

Wide metrics contain several, often related, metrics. This form is easier for calculations, such as percent usage, because the needed values can be available in the same row. Wide metrics are created by the rollup and aggregate verbs. rollup defines how each narrow metric should be aggregated over time, and aggregate determines how wide metrics from different sources are aggregated by tags.

The table below is a wide format rollup based on the example above. It includes valid_from and valid_to timestamps indicate the time period over which the average is calculated.

valid_from

valid_to

disk_used_bytes_avg

disk_total_bytes_avg

metric_tags

00:00:00

00:01:00

15000000

50000000

{“device”:”sda1”}

00:01:00

00:02:00

25000000

50000000

{“device”:”sda1”}

Create an Interface

interface maps fields to a metric interface so subsequent operations know which fields contain the the metric names and values. This metadata-only operation prepares a dataset for use as metrics.

Example:

interface "metric", metric:metricNameColumn, value:metricValueColumn

Registering, or “implementing the metric interface,” establishes the following conditions:

  • This dataset contains narrow metrics

  • Each row represents one point in a time series

  • The metricNameColumn column contains the metric names

  • The metricValueColumn column contains the metric values

Define Individual Metrics

Once the dataset is set up for metrics, use set_metric to define the metadata for each metric. If you have many metrics, you can register some and add others later by updating the Event Stream definition.

Example:

set_metric options(label:"Ingress Bytes", type:"cumulativeCounter", unit:"bytes", description:"Ingress reported from somewhere", rollup:"rate", aggregate:"sum"), "ingress_bytes"

This statement registers the metric ingress_bytes as a cumulativeCounter, which is aggregated over time as a rate, and across multiple tags as a sum. For more about allowed values for the rollup and aggregate options, please see the OPAL verb documentation for set_metric and the example walkthrough below.

Note

set_metric units use standard SI unit names from the math.js library, with the exceptions noted below. They may be combined for compound units like rates and ratios. Other units may not scale appropriately in charts, please contact us if you have trouble with an unusual or custom unit.

You may use either the unit names or abbreviations, and most names can be either singular (hour) or plural (hours.) Please see the math.js docs for details.

We recommend using full names for clarity. Note that both names and abbreviations are case-sensitive. For a unitless measurement, either omit unit: or use unit:"".

Examples of data units:

Name

Abbreviation

bits

b

bytes

B

kilobytes

kB

gigabytes

GB

terabytes

TB

bytes/second

B/s

megabits/second

Mb/s

Exceptions:

  • m is minutes, use meter for length

  • C is degrees celsius, use coulomb for electric charge

  • F is degrees fahrenheit, use farad for capacitance

Walkthrough: Putting It All Together

To show how this works, here is an example of creating metrics from process data. We have a shell script that sends data from ps to Observe every five seconds. Before it’s converted to JSON, the original output looks like this:

PID   RSS     TIME %CPU COMMAND
  1 12752        1  2.0 systemd
  2     0        0  0.0 kthreadd
  3     0        0  0.0 rcu_gp

ps reports several pieces of information for each process, so the first step is to shape the data into narrow form with OPAL.

  1. Open a new worksheet based on the Firehose, also called the Observation Event Stream. Then filter to the desired observations and extract needed fields:

    // The script used a unique path for HTTP ingestion
    // Filter on it to get the desired data
    filter OBSERVATION_KIND="http" and string(EXTRA.path)="/metricquickstart"
    
    // Flatten_Leaves creates a new row for each set of process data,
    // corresponding to one row in the original output
    // Creates _c_FIELDS_stdout_value containing each string
    // and _c_FIELDS_stdout_path for its position in the JSON object (which we don't need.)
    flatten_leaves FIELDS.stdout
    
    // Select the field that contains the data we want
    pick_col BUNDLE_TIMESTAMP, ps:string(_c_FIELDS_stdout_value)
    
    // Extract fields from the ps string output with a regex
    extract_regex ps, /^\s+(?P<pid>\d+)\s+(?P<rss>\d+)\s+(?P<cputimes>\d+)\s+(?P<pcpu>\d+.\d+)\s+(?P<command>\S+)\s*$/
    

    The reformatted data now looks like this:

    BUNDLE_TIMESTAMP

    ps

    command

    pcpu

    cputimes

    rss

    pid

    02/24/21 16:14:03.151

    1 12752 1 2.0 systemd

    systemd

    2.0

    1

    12752

    1

    02/24/21 16:14:03.151

    2 0 0 0.0 kthreadd

    kthreadd

    0.0

    0

    0

    2

    02/24/21 16:14:03.151

    3 0 0 0.0 rcu_gp

    rcu_gp

    0.0

    0

    0

    3

    Note

    If your desired data is already part of an existing Resource Set, start from there instead of beginning with Observation. See Performance for more.

  2. Shape into narrow metrics:

    // Create a new object containing the desired values,
    // along with more verbose metric names
    make_col metrics:makeobject("resident_set_size":rss, "cumulative_cpu_time":cputimes, "cpu_utilization":pcpu)
    
    // Flatten that metrics object to create one row for each value
    flatten_leaves metrics
    
    // Select the desired fields, renaming in the process
    // Also convert value to float64, as currently required for metric values
    pick_col valid_from:BUNDLE_TIMESTAMP,
      pid, command,
      metric_name:string(_c_metrics_path), metric_value:float64(_c_metrics_value)
    

    After shaping, it looks like this:

    valid_from

    pid

    command

    metric_name

    metric_value

    02/24/21 16:14:03.151

    1

    systemd

    cpu_utilization

    2.0

    02/24/21 16:14:03.151

    1

    systemd

    resident_set_size

    12752

    02/24/21 16:14:03.151

    1

    systemd

    cumulative_cpu_time

    1

    02/24/21 16:14:03.151

    2

    kthreadd

    cpu_utilization

    0.0

    02/24/21 16:14:03.151

    2

    kthreadd

    resident_set_size

    0

    02/24/21 16:14:03.151

    2

    kthreadd

    cumulative_cpu_time

    0

    02/24/21 16:14:03.151

    3

    rcu_gp

    cpu_utilization

    0.0

    02/24/21 16:14:03.151

    3

    rcu_gp

    resident_set_size

    0

    02/24/21 16:14:03.151

    3

    rcu_gp

    cumulative_cpu_time

    0

  3. Register an interface to identify this dataset as containing metrics data.

    // This interface statement specifies that the names of our metrics are in
    // metric_name, and their values in metric_value
    interface "metric", metric:metric_name, value:metric_value
    

    The interface verb adds metadata, so there’s no visible effect on the data yet. The metric keyword indicates that we want a metrics interface.

    This operation defines several important pieces of information about this dataset. Some are directly specified, and some are inferred from the dataset’s definition, or schema.

    • This is a narrow metric dataset, where each row represents one metric point

    • The values in metric_name are the metric names

    • The values in metric_value are the metric values

    • The values in valid_from are the time of the observation

    • The other fields (pid and command) are tags, used later for linking to a Resource Set

  4. Define individual metrics

    Now we have a metrics-ready dataset. It contains raw metrics data, and we have told Observe which fields contain the names and values. To use it, we need additional metadata about the individual values. Create this for each metric using set_metric.

    // RSS is a gauge, a measurement at a point in time
    // rollup type "avg" means when a metric's value is tracked over time, we want the average
    // aggregate "sum" means when these values are tracked across multiple processes, we want the total sum
    // The name of this metric is resident_set_size, linking it to identically named values in the metric_name field
    set_metric options(label:"Memory Usage: RSS", unit:"kilobytes",
      description:"Resident set size of the process",
      type:"gauge", rollup:"avg", aggregate:"sum"),
      "resident_set_size"
    
    // Cumulative CPU Time is a cumulativeCounter, a monotonically increasing total
    // rollup "rate" gives the rate at which a particular metric's value increases over time
    set_metric options(label:"Cumulative CPU Time", unit:"s",
      description:"The cumulative CPU time spent by the process",
      type:"cumulativeCounter", rollup:"rate", aggregate:"sum"),
      "cumulative_cpu_time"
    
    // CPU Utilization is also a gauge measurement, with rollup "avg" and aggregate "sum"
    // This measurement is unitless, so unit: is omitted
    set_metric options(label:"CPU Utilization",
      description:"CPU utilization of the process, expressed as a percentage",
      type:"gauge", rollup:"avg", aggregate:"sum"),
      "cpu_utilization"
    

    This defines what we want to track and how to treat it in subsequent rollup and aggregation operations. There is also another metric type, delta, the change from the previous measurement.

    Create a new dataset by publishing this worksheet as a new Event Stream.

    Publish New Event Stream dialog
  5. Link the metrics dataset to a related Resource Set

    To view metrics on a Resource Landing Page, first we need a Resource Set. Start from the Event Stream we just created, and open it as a worksheet. The pid and command fields contain additional tags for the metric data in the metric_name and metric_value fields we created earlier.

    Select these two fields (cmd-click or ctrl-click on the column headers), right click to open the menu, and choose Create New Resource Set. Check thepid and command fields, and then specify pid as the primary key. This allows us to link the new Resource Set to the metric dataset. Click Create to save.

    pid and command columns highlighted, Create New Resource Set menu item selected

    Now you have a second stage in your worksheet, for the pid Resource Set. Click Publish New Resource Set in the right rail to make it available as a Resource Set. pid isn’t that descriptive of a name, so call it Process and click Publish to save.

    Now we need to tell the metrics dataset about the Resource Set’s primary key. Open a new tab and edit the walkthrough-metrics-quickstart Event Stream definition. Select the pid field and choose Link To Resource Set from the menu and then Process in the sub-menu.

    pid column highlighted, Process selected from the Link To Resource Set and Resource Sets menus

    Click Apply, and then Save to save changes to the Event Stream definition.

  6. View metrics on the Resource Landing Page

    Open the Process Landing Page in a new tab to see the metrics in new cards.

    Process Resource Landing Page, showing cards for CPU Utilization, Cumulative CPU Time, and Memory Usage: RSS