Ingesting and Exploring Data with Observe

You’ve logged into Observe and had a look around. Maybe someone on your team started collecting data. Now what?

Learn the basics of ingesting data from a script and exploring it in Observe. This procedure walks you through generating test data, viewing it in a dataset, and shaping it into a Worksheet.

To follow this tutorial, you need these items:

  • Your customer ID

  • A data stream token (note the name of the token’s datastream.)

  • One or more MacOS, Linux, or Win 10 systems

  • Python 3.x for MacOS and Linux, or PowerShell for Windows

Obtaining a data stream token

Figure 1 - Obtaining a data stream token (video)

Using a Python script to generate basic data

You can send nearly any type of data to Observe, including from shell commands and scripts. The ps-top-cpu script obtains the highest CPU processes using the command, ps and sends it to the HTTP collection endpoint as a JSON object.

GitHub links to the script:

To use it, save the appropriate file to your local system and update the following values:

# customer_id, path, and host are used to construct the collection URL
# Example:
# https://<customer_id>.collect.observeinc.com/v1/http/<my_path>?host=<my_host>
customer_id = "101"
my_path = "my-ps-top-cpu"
my_host = "my-observe-laptop"

# ingest_token is sent in an Authorization header
ingest_token = "my-token"

# The command to run: get the process using the most cpu
# Uncomment the appropriate one for your system
# MacOS:
# cmd = "ps -Ao pid,pcpu,comm -r -c | head -n 2 | sed 1d"
# Linux:
# cmd = "ps -eo pid,pcpu,comm --sort=-pcpu | head -n 2 | sed 1d"

Note: the PowerShell script does not require a value for cmd.

The script element, path appends to the collection URL and also adds host as a URL parameter. For example, add the path ‘path/my-top-cpu’ and it appears as an additional value in the EXTRA column when Observe ingests the data. You can later use them to query events from this source. You can add additional path segments and URL parameters if you like. Separate path segments with a single slash /.

Viewing the path for the data source

Figure 2 - Viewing the path for the data source

If desired, change sleep_time_sec to send observations more or less often. The default value is every 10 seconds.

Make sure the file has execute permissions so you can run it. Since it contains your ingest token, you may want to restrict access to the script if you use a shared system.

Run the script to send data to Observe. If you send data from more than one host, remember to update host for each local copy. This allows you to see which system a particular observation came from.

Leave the script running while you explore the data. When you finish collecting, enter Ctrl-C in the terminal window to stop data collection.

Where does the data go?

When Observe ingests a new data source, before performing any shaping or filtering, the data is visible in the data stream dataset corresponding to the token used to ingest it. This dataset displays all data received by that data stream. To locate the correct dataset, go to Explore in the left menu, select the Datasets tab, and search for the name of your data stream.

Note

If you used a legacy ingest token, the Observation data set contains your data. This dataset contains ingested observations not associated with a data stream. Follow the same steps, except start from the Observation dataset instead of a data stream.

If you don’t have a lot of data yet, you can do some simple searching from the main dataset Landing Page. But for more comprehensive investigative options, open a Worksheet.

Refining your results in a Worksheet

A worksheet allows you to shape your data into a cohesive view. You can manipulate and transform, create visualizations, link additional datasets, and save and share the results.

Open a new Worksheet by clicking Worksheet button Open Worksheet.

Alternatively, go to Worksheets from the left menu and choose New Worksheet. A dialog displays different types of datasets you can select from for your new Worksheet. To see the data in your data stream, search for it by name in the dialog.

Figure 3 - Lists of available datasets

Now you have a basic worksheet with data from your data stream.

*

Figure 4 - Recent data without filtering applied

To narrow the results to just your ps-top-cpu data, start by filtering on the path.

  1. In the EXTRA column header, select Filter from the menu. This opens a dialog with a list of fields in the data.

  2. Select Value from the dropdown menu, since the path is a value rather than a field.

  3. Search for your path, then check the box and click Apply to display only those rows.

Figure 5 - Filter JSON with my-ps-top-cpu selected

In the FIELDS column, you should only see the data of interest. But it’s still JSON. Use Extract From JSON to create new columns.

Figure 6 - Extracting data from JSON

With these new columns, you don’t need FIELDS anymore. You can temporarily hide it, or delete it if you won’t use it again in this worksheet.

Figure 7 - Deleting the FIELDS column

To show a hidden column again, open Table Controls and toggle the visibility. Doing so does not change the underlying data. If you delete a column in this worksheet, you can still use it for other worksheets.

Table Controls menu, displays a list of visible and hidden columns

Figure 8 - Displaying the Table Controls menu

As you explore this data, you may notice the console at the bottom of the page. As you update your worksheet, the console displays the equivalent OPAL statements. You can combine UI actions with OPAL, the Observe Processing and Analysis Language, to build more complex queries than the UI alone. For more, see OPAL — Observe Processing and Analysis Language

Create a visualization

Now that you have some useful columns, try creating a visualization.

From the More menu, select Visualize or click on the Visualize icon.

Figure 9 - Adding visualization

This creates a new visualization card, ready to configure in the right menu.

Figure 10 - Displaying a new visualization

Figure 11 - Stacked Area chart Example

If you want to keep this worksheet, click Save. You can find it later in the Worksheets tab in Explore and pick up where you left off or share it with others. You can also change the name to a more meaningful name by clicking on “Untitled Worksheet” at the top of the page.

In addition to referring back to this particular data, you might want to link the results of your shaping elsewhere in Observe. To do this, create a new dataset by publishing it.

Publishing an event stream

You have already seen an event stream, in the form of the Observation table. Event streams, along with Resource Sets, are types of datasets. And like any dataset, they can be linked to other stages in other worksheets as part of data shaping.

To create an event stream from this worksheet, click Publish New Event Stream in the right menu. Your current worksheet updates to reference this new dataset, so if the definition changes later, it receives those changes automatically. And so does any other worksheets that reference it.

Figure 12 - Publish event stream options