Datasets and time

DObserve collects all data, system and application logs, metrics, and tracing spans, into observations, which then transform into datasets. Datasets are structured with times or time intervals, as well as relations linking to or from other datasets. These relations between different parts of the system provide Observe with superpowers for discovering the hidden meaning in your data.

Datasets

A dataset lives within a named project, and also has a name. Project names must be unique for your customer, and dataset names must be unique within their project. When you log into Observe, the Explore page lets you browse the different datasets that exist for your customer ID.

A Dataset has a schema, which is a set of named columns and definitions of data types stored in those columns (such as event or resource).

Event Datasets

If an incident occurs “at a time” and has a well-defined timestamp, then the Dataset is an Event Dataset. Events have a single point in time, and typically link or relate to one or more other tables in the system. For example, “user X logged into system Y at time Z” is an event, which also links to the “user” Dataset and the “system” Dataset.

Observe identifies Event Datasets with the following icon:


In order for a Log Dataset to appear in the list of available Log Datasets, you must add OPAL code to the desired Event Dataset.

interace "log"

Adding the interface code to the Dataset allows you to display it in the Log Explorer. The Observe interface can then interpret the Dataset as logs. For example, adding fields allows you to search and expand the fields.

To add a specific column of data to the Log Dataset, use the following OPAL code:

interface "log", "log":<nameOfLogField>

The column must have the object or string designation. If not, convert the column to an object using OPAL before adding the interface OPAL:

make_col data:object(data)

or

make_col data:string(data)

To remove log interface from the Dataset, use the drop_interface verb:

drop_interface "log"

For more information about interface, see the OPAL verb, interface.

Resource Datasets

Objects with permanence over time, and whose state changes over time, are stored in Resource Datasets. Any field value for a resource has a valid time interval — a start time, and an end time. For a resource, you can ask questions like “what was the name at time T?” Additionally, a primary key identifies a resource.

Observe identifies Resource Datasets with the following icon:


Foreign keys

If a Dataset contains one or more fields that together can be used to identify a resource in another Dataset, or even another instance of the same resource, those fields taken together make up a “foreign key.” Foreign keys consist of the following:

  • The field or fields in the source dataset that make up the key
  • The target dataset that links to the key
  • The fields in the destination dataset that match up to the fields in the source Dataset

When a foreign key exists in a Dataset, the Observe user interface displays a link to follow the key relationship to the target Dataset. You can also use foreign keys to lookup values from the target Dataset by browsing the relations.

Related keys

When some other dataset points into a Dataset, that target Dataset is also related to the other Dataset. That relationship is called a “related key” with a “what links here” relationship. This is not generally a foreign key, because many remote resources may link to a single resource instance. For example, a single host may have multiple disks in it and if you follow this related key, you may end up finding multiple remote resources for a single source resource.

Resource primary keys

This may be a GUID assigned to the resource, a user ID assigned in some database, or a MAC address of a network interface; whatever makes sense for that particular resource.

Primary keys may be composite which means they consist of a number of fields added together. For example, the primary key for a particular disk device may be the “host ID” of the host attached to a disk, and the “disk index” within that host, such as host-3, /dev/sdc.

Resource times

For recently changed values and valid until later changed, the end time is unknown, and assumed to be in the distant future. For values inherited at the start of time, the start time is unknown, and assumed to last since the dawn of time.