OPAL — Observe Processing and Analysis Language

The Observe temporal relational model considers time, and tracking system data over time, an integral part of data modeling. Yet traditional attempts to model the time varying nature of data on top of relational databases have ended up with non-standard SQL extensions. These mechanisms are often fragile and hard to use.

The Observe platform solves this problem by providing a language for expressing the kinds of operations you want to do as a user of the system, taking care of the time-dependent factors.

And since every UI action generates an OPAL equivalent, writing code by hand vs using the UI is not one or the other. You may choose to perform some operations in the UI, some in code, and some by starting with the UI and expanding in code.

This guide is designed to get you started with OPAL. It is divided into several sections:

In addition, you may find the following pages helpful:

We recommend starting with Anatomy of an OPAL pipeline and Data types and operators to understand the basics. Then begin exploring your own data in a Worksheet. We continue to improve OPAL and appreciate your feedback. Let us know how we can help!

Anatomy of an OPAL pipeline

An OPAL pipeline is a sequence of statements where the output of one is the input for the next. This could be a single line with one statement, or many lines of complex shaping and filtering.

A pipeline contains four types of elements:

  • Inputs, defining which datasets to look at

  • Verbs, defining what processing to do with those datasets

  • Functions, defining how to transform individual values in the data

  • Outputs, passing a dataset on to the next verb or the final result

A complete pipeline, also called an OPAL script, consists of a series of inputs, verbs, functions, and outputs that define the desired result. The diagram below illustrates combining multiple elements together: the first verb statement passes the results of its lookup operation to a second verb, which uses a function to remove null values.

Diagram of a lookup and filter example. Session and room data in, lookup room name from room dataset by id, filter to get only sessions with non-null room names.

Inputs

Pipeline inputs are datasets, such as an Event Stream. Pipelines may use as many datasets as required, although individual verbs vary in how many they accept. For example, lookup accepts a main input dataset containing the field of interest, and a lookup table dataset that maps those values to more useful ones.

Keep in mind that each pipeline is a single sequence of steps from input to output. If a verb accepts multiple datasets as input, those datasets may not be individually processed as part of the statement. You may, however, create an intermediate dataset in a different pipeline and use that as an input.

Verbs

Verbs are the main actors in a pipeline. Each takes a primary input, either the initial dataset or the output of the verb before it in the pipeline. Some verbs accept multiple dataset inputs, such as a join operation. A verb outputs exactly one output dataset.

Tip

See the List of OPAL verbs for details on individual verbs.

The most important verb is filter, which takes the default input and returns data matching the condition defined in the filter expression. This is analogous to the WHERE clause in a SQL query.

Streamable vs unstreamable

An important consideration is if the verb you are using is streamable. Most Observe datasets are really data streams. New data is always being added, but any particular operation is only interested in some of it.

Most OPAL verbs are therefore streaming operators: they transform one (or more) input data streams to an output data stream, and only then identify which results are within the desired query time window.

The simplest example is filter. When filter is applied to an input data stream, the filtering condition check is applied to each event. All events that pass the check form an output data stream. filter then queries the results, essentially selecting the desired set of events from the data stream. The data stream itself isn’t changed.

This works because filter is streamable: its behavior is the same for any size query time window. Streamable verbs create streamable output datasets, which can be accelerated for better performance.

A few verbs are unstreamable, meaning their output is different for different size query time windows. The resulting unstreamable dataset can’t be accelerated, so the original filter must be applied each time the dataset is queried.

Unstreamable verbs perform many useful functions, particularly for ad hoc analysis in a Worksheet. But you can’t create a new dataset from those Worksheet results, as it can’t be accelerated. To create a new dataset from a Worksheet, ensure that all its OPAL is streamable before you publish a new Event Stream.

Types of verbs

Verbs are organized into several categories, based on the action they perform. Some verbs have more than one category.

  • Aggregate

    Aggregate verbs work with aggregate functions to summarize data.

  • Filter

    Filter verbs select events matching an expression or condition, similar to SQL SELECT WHERE. A filter statement might match a pattern (literal or regular expression) or return the top values for a group of values.

  • Join

    Join verbs combine data from multiple datasets to generate an output value. For example, a union operation adds new merged and appended fields from other event datasets to the primary input dataset. The flatten family of verbs are also included in the Join category, as a special case of joining a dataset with itself to create new output events.

  • Metadata

    Metadata verbs add information about the dataset itself, rather than act on the data it contains. These verbs add additional context about the dataset’s contents, or define relationships between datasets. Common metadata operations are configuring foreign keys, registering types of metrics, and creating resources from event streams.

  • Metrics

    Metrics verbs specify how metrics are defined and aggregated, such as specifying the units of reported values.

  • Projection

    Projection verbs create or remove fields based on existing fields or values. For example, pick_col selects only the desired fields, dropping all others.

Functions

Functions act on individual values rather than datasets. Where verbs are set operations, acting upon inputs sets and returning output sets, a function is a scalar operation. It returns a single value.

Tip

See the List of OPAL functions for details on individual functions.

Types of functions

There are three types of functions:

  • Plain, or scalar functions

    Act on values from an input event field, such as converting a timestamp or comparing two values. Scalar functions always output a single value per input event.

    Example: replace_regex()

    make_col foo:"foo4-bar2" // input text
    make_col bar:regex_replace(foo, /^([A-Za-z]{3})([0-9]{1})-([A-Za-z]{3})([0-9]{1})$/,'\\3\\2-\\1\\4', 0)
    // result: new column bar containing "bar4-foo2"
    
  • Summarizing, or aggregate functions

    Within an aggregating verb statement (such as statsby), calculate a summary of multiple values across multiple input events. For example, avg() calculates the average of a field’s values across all input events that match the statsby group_by field. (This is similar to GROUP BY in a SQL query.) Aggregate functions typically output fewer events than are in the input.

    Example: count() with verb statsby

    statsby "reportsPerSensor":count(sensor), group_by(sensor)
    
  • Window functions

    Within a window() statement, a window function looks at the input events in the window and calculates an output value for each input event. For example, avg(), when applied to a window, calculates a moving average for a fixed window size over time. window() and window functions are used with make_col and similar verbs, where the window() statement is an argument defining the contents of the output column.

    Example: first()

    // get name of the earliest sensor to report in the current window
    make_col FirstToReportData:window(first(sensor))
    

Generally, functions take expressions as arguments, and can themselves be part of an expression. max(num_hosts+3) is just as valid as max(num_hosts)+3.

Scalar functions may be used anywhere an expression can be used. Aggregate and window functions are used with aggregating verbs to perform more complex operations. Some functions may be either aggregating or windowing, depending on the verb they are used with.

Output

The results of a pipeline may be presented in a variety of ways. It could be statistics like top K values, histograms, or small line charts (sparklines) for each column in the output dataset. When you are querying or modeling in the UI, many of these details are handled for you. With OPAL pipelines, you control how to display your output.