Ingesting and Exploring Data with Observe

You’ve logged into Observe and had a look around. Maybe someone on your team started collecting data. Now what?

This page describes the basics of ingesting data from a script and exploring it in Observe. It walks you through generating test data, viewing it in the Firehose, and shaping it in a worksheet.

To follow this tutorial, you will need:

  • Your customer ID

  • An ingest token (How to create an ingest token)

  • One or more MacOS, Linux, or Win 10 systems

  • Python 3.x for MacOS and Linux, or PowerShell for Windows

A basic data generating script: ps-top-cpu.py

You can send nearly any type of data to Observe, including from shell commands and scripts. The ps-top-cpu script gets the highest CPU using process with ps and sends it to the HTTP collection endpoint as a JSON object.

GitHub links:

MacOS and Linux: ps-top-cpu.py

Windows PowerShell: top-cpu.ps1

To use it, save the appropriate file to your local system and update the following values:

# path and host are used to construct the collection URL                          
# Example:                                                                        
# https://collect.observeinc.com/v1/http/my_path?host=my-laptop            
path = "my-ps-top-cpu"
host = "my-laptop"

# customer_id and ingest_token are sent in an Authorization header                
customer_id = "12345"
ingest_token = "my-token"

# The command to run: get the process using the most cpu                          
# Uncomment the appropriate one for your system                                   
# MacOS:                                                                          
cmd = "ps -Ao pid,pcpu,comm -r -c | head -n 2 | sed 1d"
# Linux:                                                                          
# cmd = "ps -eo pid,pcpu,comm --sort=-pcpu | head -n 2 | sed 1d"

Note: the PowerShell script does not require a value for cmd.

In the script, path is appended to the collection URL and host added as a URL parameter. As observations from this source are ingested, these become additional values in the EXTRA column. You can later use them to query events from this source. (You can add additional path segments and URL parameters if you like. Separate path segments with a single slash /.)

If desired, change sleep_time to send observations more or less often. The default is every 10 seconds.

Make sure the file has execute permissions so you can run it. Since it contains your ingest key, you may want to restrict access to the script if you are on a shared system.

Run the script to send data to Observe. If you are sending from more than one machine, remember to update host for each local copy. This allows you to see which system a particular observation came from.

Leave the script running while you explore the data. When you are finished collecting, type Ctrl-C to stop.

About the Firehose, or the Observation table

When a new data source is ingested, before any shaping or filtering, it is visible in the Firehose. Also called the “Observation table,” this dataset shows everything you have coming into Observe. If there isn’t much yet, you can do some simple searching from here. But it could also be quite a lot. A better way is to create a worksheet.

Refine your results in a Worksheet

A worksheet is where you shape your data into a cohesive view. You can manipulate and transform, create visualizations, link additional datasets, and save and share the results.

If you are still looking at the Firehose, you can open a new worksheet from there by clicking the Worksheet button Open Worksheet button.

Alternatively, go to Worksheets from the left sidebar and click the New Worksheet button. A dialog displays different types of datasets you could choose for your new Worksheet. To get the same data you were looking at in the Firehose, search for “Observation” and select the Observation event stream.

Now you have a basic worksheet with data from the Observation table. (The tab name has an asterisk and is in a different font to indicate you have unsaved changes.)

To narrow the results to just your ps-top-cpu data, start by filtering on its path:

  • In the EXTRA column header, select Filter JSON from the menu. This opens a dialog with a list of fields in the data.

  • Select Value from the dropdown menu, since the path you want is a value rather than a field.

  • Search for your path, then Check the box and click Apply to show only those rows.

In the FIELDS column, you should only see the data of interest. But it’s still JSON. Use Extract From JSON to create new columns.

With these new columns, maybe you don’t need FIELDS anymore. You can temporarily hide it, or delete if you won’t use it again in this worksheet.

To show a hidden column again, open the Table Controls dialog and toggle its visibility. Also, none of this changes the underlying data. If you delete a column in this worksheet, it is still available for other worksheets.

Table Controls menu, showing list of visible and hidden columns

As you explore this data, you might have noticed the console at the bottom of the page. As you update your worksheet, the console displays the equivalent OPAL statements. You can combine UI actions with OPAL, the Observe Processing and Analysis Language, to build more complex queries than the UI alone. For more, see OPAL — Observe Processing and Analysis Language

Create a visualization

Now that you have some useful columns, try creating a visualization.

From the More menu, select Add Visualization:

This creates a new visualization card, ready to configure in the right rail.

Example:

Maximum of CPU grouped by Command, as a Stacked Area chart:

If you like this worksheet, click the Save button to save it. You can find it later under “Your Worksheets” and pick up where you left off or share it with others. You can also change its name to something more meaningful by clicking on “Observation” at the top of the page.

In addition to referring back to this particular data, you might want to link the results of your shaping elsewhere in Observe. To do this, create a new dataset by publishing it.

Publishing an event stream

You have already seen an event stream, in the form of the Firehose. Event streams, along with Resource Sets, are types of datasets. And like any dataset, they can be linked to other stages in other worksheets as part of data shaping.

To create an event stream from this worksheet, click Publish New Event Stream in the right rail. Your current worksheet updates to reference this new dataset, so if its definition changes later, it gets those changes automatically. (And so will any other worksheets that reference it.)