Explore the details of your data in a Worksheet: filter to rows of interest, extract new fields, create a visualization, and more. Since the underlying data isn’t changed, you can try multiple scenarios as you shape and transform.
A future version of this tutorial will include a sample dataset.
Introduction to data modeling¶
This example uses data from a set of IoT devices. They report measurements like temperature and humidity, when doors are opened or closed, and power usage of appliances. Like other data sources, the data is visible in the Firehose, also known as the Observation Event Stream. The Firehose has basic search capabilities, but a spreadsheet-like Worksheet gives you many more options, from searching and filtering to creating additional Event Streams and Resource Sets.
This basic Worksheet contains a single Stage, a table showing the results of the transformations applied to it. In this case, there are none yet. It is still an undifferentiated collection of raw observations. Since there is so much data here, the first step is to filter to what you want to look at: the IoT sensor data.
When this HTTP data source was configured, it included a unique path:
smartthings-bridge. This became a value for
path in the EXTRA field.
To filter on this value, select Filter JSON from the EXTRA column menu to open the Filter JSON dialog.
The Filter JSON dialog allows you to select which data you want to see, either by field name or individual value. Select Value from the dropdown and search for
smartthings-bridge to show only the matching rows.
Add a second Stage¶
With the data filtered,
EXTRA now contain redundant information. You could delete them right now, but perhaps they might be useful for a later investigation. Click Link New Stage to create a second Stage based on the first one.
Stages progressively build on one another, but they are not themselves independent datasets. Every Stage in a Worksheet maintains a history of the actions applied to it, inheriting the state of the parent. All the data remains in the base dataset. Worksheet Stages are transient views of that data.
Now you can use Delete Column from the column heading menu to remove
EXTRA, without affecting the previous Stage.
Create a dataset¶
The second Stage now shows only the data of interest: the ingest timestamp and the JSON payload. This is a good point to think about how you might want to use this data.
The underlying source for this Worksheet is the Observation Event Stream, which contains everything from all your data sources. Even with only a few sources, this is an enormous amount of data. Operating directly on all of it, every time, won’t give the best performance. A better option is to create a new Event Stream dataset, so subsequent operations are only applied to the relevant data.
With the second Stage selected (the right rail should say “Linked from Observation”) click Publish New Event Stream. Give this dataset a unique name, such as “Worksheet example IoT raw events”, and click Publish.
The two Stages in the Worksheet are consolidated to a single Stage, with data from the newly created Event Stream. The history is not lost, but is maintained in the Event Stream Definition.
Next, explore the contents of
Select Extract from JSON in the
FIELDS column menu to create new fields from the JSON payload. For this data,
deviceEvent.value are a good place to start. This gives you the ID of the sensor, what type of information it reports, and the value of that reading.
But some of these Observations aren’t actually valid sensor data. To show just the good readings, remove the
null value rows by selecting Remove Empty from the
deviceId column menu.
Create a new Resource Set¶
These extracted fields show a basic picture of the data, but the raw
deviceId isn’t particularly meaningful. There is a
deviceInfo.label in the payload, you could extract a field for that as well. But if you later want to shape this same data in different ways, each new Worksheet would need to extract
label every time. A more convenient option is to create a Device Resource each can link to.
To model each sensor as a Resource, start with a new Worksheet for the “Worksheet example IoT raw events” dataset you created earlier. But this time, extract fields related to the sensor itself:
deviceinfo.label. Use Remove Empty on
deviceId to show only valid values.
At the bottom of the right rail, select Create Resource in the Actions menu to open the Create Resource dialog. (If you don’t see it, make sure you don’t have a column selected.) Select the
deviceId fields, and also
deviceId as the Primary Key so there is a unique key for other datasets to link to. Click Create. This creates a second Stage for
Next, indicate which field contains the Resource names. This allows other Worksheets or Landing Pages to show a meaningful device name instead of
deviceId. From the
label column menu, select Set as Resource Set Label.
If the list of
label values look good, click Publish New Resource Set to create the Resource Set dataset. Give it a unique name, like “Worksheet example IoT Device” and publish.
Save the updated Worksheet (optional)¶
You now have two Worksheets, an Event Stream, and a Resource Set. The Worksheets use data from the Event Stream and Resource Set, but those datasets no longer depend on the Worksheets they were created from. You may save them for later use, or create a new Worksheet next time you need one.
Explore more complex data¶
You may have Event Streams and Resource Sets created by other members of your team. In this case, shaping and linking existing datasets may be a more common way to use a Worksheet than working with raw data from a new source.
Here is an example of the same IoT data, modeled as a collection of linked Event Streams, Resource Sets, and Resources. More detailed shaping and linking makes it easier to understand the relationship between different types. From here, you can perform many different types of investigations with the data.
Filter and visualize¶
Start with a question: When did someone last make coffee? Coffee at Flamingo House starts by turning on the electric kettle to boil water. Starting with a Worksheet for Device Events, filter
Device to just the
Water Kettle events.
The kettle has two sensors:
switch. The power sensor reports how much power the kettle is using, so when this value spikes that means someone has turned it on. That is a good choice for the next filter.
With only the power readings, the timechart now shows a spike in usage a few hours ago. To see this in more detail, create a visualization.
Select Add Visualization from the More menu in the upper right and select
value for the Y-Axis and Average for Function. Since there’s only one kettle, there is no need to select a GROUP BY option.
The large spike confirms this is when someone turned on the kettle for coffee, at approximately 8AM.
The OPAL console¶
So far, all this modeling has only used the UI. But every UI action also generates an OPAL statement, visible at the bottom of the page in the console.
OPAL, the Observe Processing and Analysis Language, is a full featured query and modeling language. You can perform the same UI actions with OPAL statements, or add additional statements to your OPAL script for operations not yet available in the UI. Choose whichever you are most comfortable with.
For example, the Filter JSON menu action to filter to
smartthings-bridge is equivalent to this OPAL
filter (string(EXTRA.path) = "/smartthings-bridge")
For more about OPAL, see Observe Processing and Analysis Language. For additional OPAL examples, see Helpful hints: OPAL. Please contact us if you have questions about data modeling or OPAL, and look for more more resources in the near future.