Key Observe Concepts

Observe collects all of your data, system and application logs, metrics, tracing spans, and any other kind of data then ingests them into a data lake. Then, depending on your individual use case, Observe curates the relevant event data into datasets that provide more structure, faster queries, and make it easier to understand the information than the original raw event datastream.

Datasets can be linked to other datasets to make it easy for you to access and correlate relevant context during an event investigation. You can then package up datasets, along with dashboards and alerting conditions(monitors), to produce Observe applications.

To use Observe effectively, understanding the key concepts of Observe provides you with the background to perform the following tasks:

  • Ingesting data into the Observe data lake using datastreams.

  • Exploring the data that is in Observe.

  • Live incident debugging using Live Mode.

  • Visualizing data and presenting on dashboards.

  • Using dashboards to view and filter datasets.

  • Navigating between related datasets using GraphLink.

  • Using worksheets and Observe Processing Analytics Language (OPAL) for manipulating event data.

  • Installing Observe applications.

  • Using the Dataset Graph feature to understand datasets and the relationships between them.

Datastreams

Observe ingests event data using datastreams. Once ingested, all event data associated with a specific datastream populates a dataset.

A datastream ingests data from multiple different sources with each source identified by a unique token.

Individual tokens may be enabled, disabled, or even deleted without impacting data associated with a different token on the same Datastream. This allows you to easily rotate datastream tokens.

More about information about Datastreams can be found here.

Datasets

Datasets are much like tables in a database. They consist of rows and columns of specific types. They can be linked, joined, and aggregated to derive insights. But unlike traditional tables, datasets automatically grow as Observe collects new data.

Datasets contain built-in support for time and history. This provides an easy way to track the state and relationships between objects you care about over time, whether the object is a server in a data center or a virtual object such as a shopping cart in an e-commerce application.

Datasets accelerate the querying of large amounts of data by continuously transforming raw data into more structured information. The information becomes easier to understand and faster to query.

A dataset has a name and lives within a project. Project names must be unique within a workspace, and Dataset names must be unique within that project. In the Observe console, the Datasets page lets you browse the different Datasets that exist within your workspace and the Dataset Graph page shows you how they are related.

Datasets may be installed by an Observe application with pre-built collections of datasets, dashboards, and alerts for observing specific environments such as AWS, Kubernetes, or Linux hosts. Alternatively, you create them for custom use cases, for example, a mobile payments application.

Create links between pre-built datasets and custom datasets, combine information from any number of datasets in dashboards, and create monitors that combine pre-built and custom datasets. You can even look at the definitions of pre-built datasets, dashboards, and monitors, to learn from them and extend them.

When creating custom datasets, you must specify a set of named columns, the type of data stored in the columns, and the kind of custom dataset you want to create, either an Event Dataset or Resource Dataset.

Event Datasets

All event data that Observe ingests on a datastream goes into a corresponding source dataset. This type of source dataset consists of nothing but timestamped events and is an event dataset.

This source dataset typically contains billions of events and as such is not very efficient to work through. Rather than just providing you with a search bar to look for breadcrumbs, Observe allows you to further curate the source dataset into smaller, more useful, chunks.

For example, the Kubernetes source dataset may break out Container Logs which can be further broken out into Web Logs and Application Logs thereby creating three discrete event datasets. The Web Logs dataset can feature an HTTP Status Code column which may derive from Container Logs using a regular expression. If you want to look for 404 (Not Found) errors for a website, you can now go directly to the Web Logs and quickly find the data you need.

Observe identifies Event Datasets with purple icons.

Event Streams

Figure 1 - Kubernetes Container Logs dataset

Interval Datasets

An Interval dataset describes events with start and stop times, such as a trace or span used in distributed tracing or Application Performance Monitoring use cases. In an Event dataset each row is a point in time, but in an Interval dataset each row is a longer time period. Interval datasets have two timestamps, for the start and end points. When an Interval dataset is queried or accelerated, only rows with overlap to the query window will be included.

Resource Datasets

Resource datasets contain information about virtual or physical things such as servers, users, shopping carts, etc. Observe collectively calls these resources.

Each resource dataset contains a primary key, a combination of one or more columns, that uniquely identifies a resource. For a container resource dataset, this can be as simple as a Container ID. The primary key enables the resource dataset to link to other, related, datasets. The links are similar to foreign key relationships in traditional databases. Primary keys and links drive many of the advanced search and navigational capabilities in Observe.

Essentially, resource datasets behave like temporal tables and store the full history, state, and relationships of virtual or physical things such as servers, users, or shopping carts.

For example, Kubernetes regularly sends events describing the state of pods, containers, and more, at a particular point in time. Observe collects all of those events and derives an inventory of all pods and containers along with the associated state over time.

Observe derives the individual states and state transitions from event datasets. This means you can not only see the current state of all the pods, containers, and more, in the Kubernetes cluster. You can also see the state, and the state of related resources, as it was an hour ago, a day ago, or last month. And you can drill down into individual events that affected the state and state changes.

Resource datasets store objects with permanence over time, and the state changes over time. State may be composed of many attributes, each with a specific value for a specific time interval. Because a Resource Dataset keeps track of all attributes, you can ask questions such as “What was the state of Pod P at time T?”.

Each attribute forms a column in the respective resource dataset and the resource dataset represents the time intervals by a pair of designated valid-from and valid-to columns.

Like Intervals, the rows in a Resource dataset have two timestamps, for a start time and an end time. In a Resource, these timestamps describe the time period during which a state was true, such as a container instance being in “Active” or “Pending”. In an Interval, these timestamps describe the time period during which the Interval existed at all, such as a process executing in a container.

Observe identifies Resource Datasets with blue icons.

Kubernetes resource dataset

Figure 2 - Kubernetes Resource dataset

For more information on working with Resources, see Resource Explorer.

Table Datasets

Each of the Datasets discussed above is a temporal time-series, which is key to Observe’s ability to link resources to events and states and provide Observability into your systems. Some OPAL verbs produce non-temporal data, such as a table of statistics about data: this is called a Table Dataset. Table datasets are transient artifacts that exist when using a Worksheet or Dashboard. They cannot be accelerated.

Dataset Graph

The Dataset Graph feature displays the interrelationships between your datasets. When you create worksheets, you use Dataset Graph to link that information to other datasets and create a relational database of all your data.

The Dataset Graph contains three different views of datasets:

  • Links - displays datasets and their links, and optionally displays the status of each dataset, such as whether the dataset is currently receiving data or not.

  • Lineage - displays how each dataset is derived, with source datasets on the far left of the graph, and data flow to the right, to successively derived datasets.

  • Focus - displays the currently selected dataset and links to and from it.

Take a tour of GraphLink

Explorers

Observe provides intuitive interfaces for popular data analysis use cases. Log Explorer, Metrics Explorer, Trace Explorer, and Resource Explorer are useful starting points for data analysis. To solve more advanced use cases, such as correlating different types of Observability data or efficiently performing a repetitive search, Dashboards and Worksheets can be useful tools as well.

Worksheets

Observe Worksheets provide a spreadsheet-type interface for directly manipulating resource or event datasets, enabling you to perform tasks such as extracting fields, aggregating, visualizing, and correlating data.

Dashboards

Observe Dashboards provide a logical way to group data visualizations and tables within Observe. Dashboards can be created from any dataset or worksheet.

Applications

Observe Applications are packages of related content that make it easier for you to ingest and use data to solve problems.

Monitors

Observe Monitors provide a flexible way to alert when patterns are present in your incoming data.