Working with the Metrics Explorer

A metric contains any value you can measure over time. It can be blocks used on a filesystem, the number of nodes in a cluster, or a temperature reading. Observe reports Metrics in the form of a time series: a set of values in time order. Each point in a time series represents a measurement from a single resource, with its name, value, and tags.

For more information about Metrics, see Introduction to Metrics.

Metrics Explorer Overview

Before you begin looking at metrics, take a few minutes to understand the various capabilities of Metric Explorer.

../../_images/metrics-home.png

Figure 1 - The Metrics Explorer Landing Page

Search

All available Metrics Datasets can be located using the Search all metrics field.

Time Range and Zone

Change the time range for your metrics using the Select Time Range to select your preferred time range. You can also select the Time Zone from a list of available time zones. Observe uses PDT (Pacific Daylight Time) as the default time zone.

Zoom

Use the Magnifying Glass icons to zoom in and out of your data.

Query Overview

The Query Overview icon displays the status of your current query. You can also refresh the query using the Refresh icon.

Favorites and Sharing

Add the Metrics to your list of Favorites and share a link to the current Metrics with anyone who can access the Observe instance.

Actions

Once you have your Metrics Dataset, you can select from the following Actions:

  • Create monitor - add a Monitor to keep track of anomalies or targets or interest in your Metrics Dataset. Receive Alerts when events trigger the Monitor.

  • Add to dashboard - add the visualization to a Dashboard that you reference “at-a-glance” for details on activities.

  • Open in worksheet - perform further modeling of the Metrics Dataset.

In the Query Builder panel, you can optionally override the Metrics Dataset name, duplicate the Dataset, or delete the Query Expression.

Using Metrics Explorer to Analyze Metrics

Immediately visualize any Metrics Dataset using Metrics Explorer. From the left Navigation bar, click Metrics to view Metrics Explorer.

To locate a specific Metrics Dataset, you can search for it by using the Search all metrics field, and entering the name or partial name to locate it. For instance, type cpu into the Search all metrics to see all Metrics Datasets with cpu in the name.

../../_images/search-metrics.png

Figure 2 - Searching Metrics Dataset

Highlighting a Metrics Dataset displays a card with information about the content of the Dataset. The information includes the following fields:

  • The name of the Metrics Dataset and a brief description of the content

  • Related resources for the Dataset

  • Fields included in the Metrics Dataset

  • Properties of the Metrics Dataset

../../_images/metrics-info.png

Figure 3 - Metrics Dataset Content and Properties

In this example, you have servers using Telegraf to forward data to Observe, and you want to review the CPU Utilization for any spikes in usage.

Selecting cpu_utilization (Server/Telegraf CPU Metrics) displays the visualization of the Metrics Dataset.

../../_images/cpu-telegraf.png

Figure 4 - Visualizing the CPU Utilization Metrics Dataset as a Timeseries

Select a different time range for your Metrics Dataset by clicking on the down arrow next to the current time and selecting a time range from the list.

Configuring a Query using Query Builder

Drill down into a Metrics Dataset using the Query Builder to filter specific information in the Where field.

But in this example, you leave the Where field empty so you can review CPU Utilization on each host. Instead of Sum, select Average, and then by Host.

../../_images/all-hosts.png

Figure 5 - View All Hosts and CPU Utilization

You can see from the Time Series visualization that several Hosts had high CPU utilization at nearly the same time. Hover over each line to identify the hosts:

../../_images/high-cpu.png

Figure 6 - Identifying Hosts with High CPU Utilization

Right-click on a graph line to display options available for further investigation.

../../_images/menu-options.png

Figure 7 - Identifying Hosts with High CPU Utilization

You can select from the following options:

  • Show this data only - this allows you to display only that graph line in the visualization.

  • Exclude this data - remove the data from the visualization.

  • Copy - copy the graph line.

  • Inspect - inspect the data for the graph line.

  • For selected resource - displays the related resource which you can open in a new window.

  • View related - view the following related data in new windows:

    • Dashboard

    • Metrics

    • Logs

Double-click the graph line to return to the full visualization. To further investigate the cause of the high CPU Utilization, click Actions, and then Worksheet. Creating a Worksheet allows you to view the Metrics Dataset and perform modeling of the dataset.

Once you have the visualization with the desired Metrics information, perform one of the following Actions:

  • Create monitor - create a monitor to alert you about high CPU Utilization.

  • Add to dashboard - add the visualization to a Dashboard that you reference “at-a-glance” for details on CPU Utilization.

  • Open in worksheet - perform further modeling of the Metrics Dataset.

In Query Builder, you can select from the following types of visualization:

  • Time Series

  • Bar Chart

  • Stacked Area

  • Single Stat

  • Pie Chart

  • Value Over Time

Customize your Visualization even more by changing these parameters:

  • Settings - customize the X and Y Axis using the dropdown menus to change the display.

  • Axes - select units and customize the X and Y axis labels.

  • Color - changes the colors used in the Visualization.

  • Chart Style - change the shape of the graphed line.

  • Legend - change the legend position and presentation.

  • Thresholds - toggle displaying thresholds on and off.

The visualization Alignment defaults to Over Time using Average (look back 10s). To change the Alignment type, click Edit. Select from the following to model your Metrics Dataset:

  • Sum - the default value for the OPAL function is Sum.

You can select from the following list of available OPAL functions:

  • Any - Return any value of one column across a group.

  • Any not null - Return any non-null value of one column across a group. Can still return null if all values in the group are null

  • Average - Calculate the arithmetic average of the input expression across the group.

  • Count Values - Count the number of non-null items in the group.

  • Count Distinct Fast - Estimate the approximate number of distinct values in the input using hyper-log-log.

  • Count Distinct Exact - Count the exact number of distinct values in the input using complete enumeration.

  • Maximum - Compute the maximum of one column across a group (with one argument) or the scalar greatest value of its arguments (with more than one argument).

  • Median - Return the fast approximate median value of one column.

    • Median Exact - Return the exact median value of one column.

    • Minimum - Compute the minimum of one column across a group with one argument or the scalar least value of its arguments with more than one argument.

  • Percentile(99) - Returns an approximated value for the specified percentile of the input expression across the group. percentile(@."*metric*", .99

  • Percentile(95) - Returns an approximated value for the specified percentile of the input expression across the group. percentile(@."*metric*", .95

  • Percentile(90) - Returns an approximated value for the specified percentile of the input expression across the group. percentile(@.”metric”, .90`

  • Percentile(75) - Returns an approximated value for the specified percentile of the input expression across the group. percentile(@.”metric”, .75`

  • Percentile(50) - Returns an approximated value for the specified percentile of the input expression across the group. percentile(@.”metric”, .50`

  • Prometheus Quantile(99) - Returns a value for 99th percentile distribution.

  • Prometheus Quantile(95) - Returns a value for 95th percentile distribution.

  • Prometheus Quantile(90) - Returns a value for 90th percentile distribution.

  • Prometheus Quantile(75) - Returns a value for 75th percentile distribution.

  • Prometheus Quantile(50) - Returns a value for 50th percentile distribution.

  • Standard Deviation - Calculate the standard deviation across the group.

  • Sum - Calculate the sum of the argument across the group or the scalar arguments if more than one.

  • Don’t Aggregate - Do not aggregate metrics.

You can also build a query using the OPAL console and OPAL language.

To monitor these metrics and create an alert about increased CPU Utilization, see Configuring Threshold Metrics Monitors.

Live Mode

Note

Only customers with usage-based pricing can access this feature.

Click Live Mode when viewing Metrics and see your Metrics stream into Observe. Filter your Metrics and generate visualizations that continuously update with new data. As soon as you click Live Mode, ingest and transform pipelines run at the highest possible rate. As new data arrives, the data transforms, and the query reruns.

../../_images/live-mode1.png

Figure 8 - Enabling Live Mode

Note

Use Live Mode to start a one-time materialization of your data as Live Mode functions as a temporary freshness boost.

Since Live Mode increases your credit usage, you may want to disable it unless actively working on troubleshooting an ongoing issue. Live Mode automatically becomes disabled after 15 minutes. Using the Time Scrubber feature also automatically disables Live Mode.

For Metrics Explorer, you can select from 5 minutes, 10 minutes, or 15 minutes.

../../_images/query-length.png

Figure 9 - Selecting Query Length

When you enable Live Mode, and click the Query icon, you see information about the query similar to the following image.

../../_images/live-data1.png

Figure 9 - Live Mode Query Details

Latest data received - the time that data required for the query most recently arrived on the Observe instance but has not yet been processed.

Latest data available to query - the latest system time at which new data was processed and became available for Observe to query it. Live Mode users can typically expect between 30 and 90 seconds of latency from source to screen, depending on data rate and agent configuration.

These two status messages may have slightly different times as the first one designates the time that the data required for the query most recently arrived on the Observe instance but has not yet been processed. The second message designates the time the data became available for Observe to query it.

Exporting Data

To download the data displayed in Metrics Explorer, click the Actions menu in the top right and Open in Worksheet. On a Worksheet you can switch the visualization to Table, and then click the Export button. You may select CSV or JSON format, and a maximum size limit (one thousand, ten thousand, or one hundred thousand rows). Note that hidden fields will be included. Use the pick_col OPAL verb to reduce the width of downloaded data.