Monitors and Alerts

Observe Monitors provide a flexible way to alert when conditions are matched in your data. You can use multiple rules and conditional tests to determine the severity of a condition, route to different destinations based on severity, and reactively or proactively mute alerting.

Note

The Monitors v2 engine is currently in private preview, please work with your Observe team to enable this feature flag. See documentation for Monitors v1.

The Version 2 Upgrade

The Observe Monitoring engine upgrade brings many valuable new capabilities to your Observability Cloud experience, but safely changing monitoring systems takes caution. When Monitors v2 is enabled via feature flag, your existing monitoring system continues to run and can still be managed independently of Monitors v2. To access Monitors v1 from a Monitors v2 enabled instance, go to Monitors and click Legacy Monitors at the top right. All features of the v1 system are accessible there.

What does a Monitor do?

A Monitor watches a dataset for a particular condition, such as a count of events or a specific text value. When you create a Monitor, Observe makes a new dataset based on the contents of the dataset and your conditions. This allows multiple Monitors from the same dataset to be independent of each other.

There are three monitoring types available:

  • Threshold - Send an alert when a value crosses a threshold over a period of time. A Threshold monitor is ideal for metrics correlations (CPU is high and transactions are low), logging alerts (errors are high), or negative monitors (service data is no longer arriving).

  • Count - Send an alert when the count of matching records crosses a threshold over a period of time. A Count monitor is ideal for matches

  • Promote - Send the data as an alert when new matching data arrives.

When does a Monitor execute?

Observe Monitors execute as soon as data is available. Incoming data is ingested, shaped into a Dataset, and then ready for use. A Monitor is just another Dataset, and evaluates when data is added to it.

Some data sources cannot produce data in a timely and ordered fashion. If your monitor relies on data that is not complete for a few minutes, there are some approaches to solving this problem:

  • In the Monitor Query section of the monitor, under Advanced options, you can introduce a delay before evaluation.

  • Freshness goal adjustment: Use Acceleration Manager to reduce the Monitor’s freshness goal.

  • OPAL editing: Use OPAL to prevent evaluation of the “ragged right edge” data. The window and frame functions can be used together to filter the evaluated data set.

Monitors Overview

When you select Monitors from the left hand rail, Observe displays a list of existing Monitors configured on your instance.

You can filter and search monitors by any attribute, including the Description field. Use Created By and Modified By to search by people who have worked on a Monitor.

Click the Last Triggered value to see Alerts generated by this Monitor.

Clicking a Monitor will open it in read-only mode; click the Edit button at top right to change its definition if your account has access. The read-only page for a Monitor includes access to that Monitor’s logs and insight to its metrics, so you can evaluate its performance.

There are five Status options for monitors:

  • Running - the Monitor is active and healthy

  • Triggering - the Monitor is active and has currently active Alerts

  • Degraded - the Monitor has recognized data issues or has been unable to send notifications

  • Error - the Monitor has not been able to execute

  • Inactive - the Monitor is disabled by an administrator

You can access Legacy Monitors, Shared Actions, Mute Windows, and the New Monitor creation tool from the top right.

Types of Monitors

There are three types of monitors in Monitors v2:

Threshold

The threshold monitor alerts when a value crosses a threshold over a period of time. Thresholds are ideal for metrics data, where a numeric value is set as part of the dataset definition. You can also use other datasets, such as logs, traces, or resources, by selecting a single numeric column as the metric value.

For example, here are some threshold monitor use cases:

  • Alert when the CrashLoopBackOff metric in Kubernetes Pod Metrics is high

  • Alert when the bytesSent in AWS S3 Access Logs is higher or lower than expected

Count

The count monitor alerts when the number of rows in a monitored set cross a threshold over a period of time. Counts are ideal for measuring volumes of data instead of contents of data and are good for negative monitors.

For instance, a count monitor use case would be to alert on the number of User Access Logs matching an error condition and a URL regular expression.

Promote

The promote monitor sends the matching data in a monitored set to the destination. Promotes are ideal for sending actionable alerts to human operators or analysts, because you can include all relevant data directly into the message.

Some example promote monitor use cases include:

  • Crash reports that link in the affected customer, responsible engineer, and triggering condition from other datasets

  • Customer feedback alerts that include contextual data or links to investigative tools

Conversion from earlier Monitor types

Monitors from legacy monitoring are not automatically converted or migrated. To plan a migration of existing monitors, contact your Observe Data Engineer.

Monitors 1

Monitors 2

Metrics Threshold

Threshold

Log Threshold

Threshold

Count

Count

Text Value / Facet

Count

Promote

Promote

Muting Monitors

An active Observe Monitor always produces Alerts when the rules match, but you can suppress delivery of Alert notifications by muting the monitor. Observe provides two easy-to-use ways to mute: ad-hoc, and scheduled.

  • Ad hoc mutes - Using the context menu of a single Monitor or multi-selecting several Monitors, you can start an ad hoc mute. The selected Monitor(s) will be muted starting now for the selected time period. An ad hoc mute is good for suppressing alerts from a known issue so that you can concentrate on solving the issue.

  • Scheduled mutes - Scheduled mutes apply globally to all monitors that match your conditions. Click View mute windows in the top right of the Monitors page, then New mute window. Set a time range, then add key=value conditions to determine which monitors will be muted. A scheduled mute is good for preparing for planned activity, such as a deployment to a customer cluster.

Unmuting Monitors

An ad hoc mute is visible in the Monitors list page, and can be disabled from here. Select one or more muted Monitors and use the context menu to select Unmute.

A scheduled mute must be managed from the View mute windows area at the top right of the Monitors list page. Click the button to see the list of active mutes. Delete mute windows that are no longer needed.

Note that ad hoc mutes are also visible in the Mute windows list and can be deleted from here as well.

Creating a New Monitor

To create a new Monitor in Observe, use the following steps:

  1. Log into Observe and click the Monitors icon on the left side navigation.

  2. On the Monitors page, click New Monitor.

  3. From the Select your monitor type panel, select the type of monitor you want to create in Observe:

    • Threshold - Send an alert when a value cross a threshold over a period of time

    • Count - Send an alert when a resource’s count crosses a threshold over a period of time

    • Promote - Send the data as an alert whenever there is new data in a set

Monitors can also be created from data browsing, such as Explorers or Worksheets.

  • In Log Explorer, click the Action menu at top right and select Create a monitor.

    • If the explorer visualization is raw data, a new Count monitor creation form will open using this data.

    • If the explorer visualization is a chart, a new Threshold monitor creation form will open using this data.

  • In Metrics Explorer, click the Action menu at top right and select Create a monitor. A new Threshold monitor creation form will open using this data.

  • In Trace Explorer, click the Action menu at top right and select Create a monitor. A new Count monitor creation form will open using this data.

  • On a Worksheet, click the context ellipses menu for a stage and select Create a monitor. Choose the type of monitor and proceed as above.

Note

Known Issue: Monitors v2 is currently restricted to a single stage when making a new monitor from a worksheet. Convert your worksheet’s logic to one stage to proceed.

What are Shared Actions?

A Shared Action is a destination template that specifies the type of alert, the receiving location, and the optionally customized message or payload. You do not have to configure a Shared Action to use Monitors, but it can save some time if you want to frequently re-use notification patterns. For more detail, see Shared Actions.

Reviewing a Monitor’s Data Lineage

You may need to review a Monitor’s data sources to understand the data it relies on, the latency of that data, and the size of the queried window over time. Balancing the speed and cost of a monitor requires controlling the freshness of the data that monitor relies on.

To review metrics such as latency and queried window size, open a monitor in read-only and click the Insights tab.

  • Upstream latency - time from ingest to monitor

  • Result latency - time from ingest to alarm

  • Evaluation time - time spent in monitor evaluation

  • Query window size - time frame evaluated by the monitor

To quickly assess the effective freshness of a monitor with a complex data lineage, use the Acceleration Manager. From Settings at the lower left, click Workspace Settings, Acceleration Manager, and Monitors. Sort by Effective Freshness to filter the list. Monitors with an Effective freshness that is worse than the Freshness goal will have a warning icon and colored display. Hover over these lines to get a context menu, and click the More icon to edit the Monitor definition. You can also access monitor definitions from the Acceleration Manager’s Monitors list page. Hover over the monitor to access the context menu, and click the Pencil icon to edit the Monitor definition. Note that Legacy Monitors are on a separate page and can also be edited.

Edit the Monitor by clicking the pencil icon at top right or from the context menu and navigate to the Monitor Query. Click Manage Inputs at the right side to list the input datasets, and click the Open or Edit buttons to review their data or definitions. Note the Manage Inputs button is hidden when using an expression builder, click the OPAL button at the right side to expose it.