Observe Datasets and time¶

Observe collects all data, system and application logs, metrics, and tracing spans, into observations, which then transform into datasets. Datasets are structured with times or time intervals, as well as relations linking to or from other datasets. These relations between different parts of the system provide Observe with superpowers for discovering the hidden meaning in your data.

Datasets¶

A dataset lives within a named project, and also has a name. Project names must be unique for your customer, and dataset names must be unique within their project. When you log into Observe, the Explore page lets you browse the different datasets that exist for your customer ID.

A Dataset has a schema, which is a set of named columns and definitions of data types stored in those columns (such as event or resource).

Event Datasets¶

If an incident occurs “at a time” and has a well-defined timestamp, then the dataset is an Event Dataset. Events have a single point in time, and typically link or relate to one or more other tables in the system. For example, “user X logged into system Y at time Z” is an event, which also links to the “user” dataset and the “system” Dataset.

Observe identifies Event datasets with pink icons.

Creating Log Datasets from Event Datasets¶

In order for a Log Dataset to appear in the list of available Log Datasets, you must add OPAL code to the desired Event Dataset.

interface "log"

Adding the interface code to the dataset allows you to display it in the Log Explorer. The Observe interface can then interpret the dataset as logs. For example, adding fields allows you to search and expand the fields.

To add a specific column of data to the Log Dataset, use the following OPAL code:

interface "log", "log":<nameOfLogField>

The column must have the object or string designation. If not, convert the column to an object using OPAL before adding the interface OPAL:

make_col data:object(data)

make_col data:string(data)

To remove log interface from the Dataset, use the drop_interface verb:

drop_interface "log"

For more information about interface, see the OPAL verb, interface.

Interval Datasets¶

An Interval dataset describes events with start and stop times, such as a trace or span used in distributed tracing or Application Performance Monitoring use cases. In an Event dataset each row is a point in time, but in an Interval dataset each row is a longer time period. Interval datasets have two timestamps, for the start and end points. When an Interval dataset is queried or accelerated, only rows with overlap to the query window will be included.

Resource Datasets¶

Objects with permanence over time, and whose state changes over time, are stored in resource datasets. Any field value for a resource has a valid time interval — a start time, and an end time. For a resource, you can ask questions like “what was the name at time T?” Additionally, a primary key identifies a resource.

Observe identifies Resource datasets with blue icons.

Foreign Keys

Resource Times

Resource Primary Keys

Reference Tables¶

Observe can accept various types of data, not limited to logs, metrics, or traces. You can send your customer list, product list, or a list of IP addresses to enrich your telemetry data. Many of our customers do this today to produce Resource datasets, which is great for short-lived associations like EC2 hosts or engineer-to-account mappings. However, since most of this business context doesn’t include specific timestamps, it needs to be refreshed periodically to prevent it from expiring in resource datasets. This is less optimal for long-lived associations like Product or Employee IDs. Reference tables offer an easy way to integrate long-lived business context with machine data. See the Reference Tables page to learn how to create and manage reference tables.