Tutorial: modeling weather data¶
This tutorial shapes weather data using a combination of UI actions and OPAL statements. Many people like to mix and match: performing some actions in the UI and some with OPAL. Also, not every OPAL operation has an equivalent in the UI. So it’s useful to be comfortable with both.
The data comes from OpenWeather, via a process that periodically requests current weather observations for a selection of cities. This poller is not currently user configurable. If you would like to follow along with live data, contact us about configuring this data input in your workspace.
The OpenWeather data¶
Each weather observation contains the current conditions for a single location, with common measurements such as temperature, humidity, wind speed, and more. You can find an example of a JSON observation in the OpenWeather documentation.
For this tutorial, we will create new datasets for the incoming data, extract fields and create metrics, and display data on a custom board.
Here is what we will cover:
Identify the data of interest¶
When you ingest data, it initially goes into the Observation event dataset. This dataset contains everything ingested into your workspace, so that could be quite a lot. (If you are using data streams, your data may be in a data stream dataset instead. See Data streams for details.)
The first step for a new input is to identify the data of interest and create a dataset for it. Do this with a new Worksheet:
Open a Worksheet with the Observation event dataset:
From the Explore tab, click Datasets and search for “Observation”.
To view just the weather data, filter by something unique to this source. In this case, the weather data has a
poller_type field in the
Open the menu for the EXTRA column, either with the down arrow or by right-clicking the column header.
Select Filter from the menu.
poller_typeand check the box to select it. (If you don’t find it, try selecting a longer time range using the time picker.)
There could be multiple sources using pollers, so also filter on the value
weather. These two things uniquely identify the weather data:
Select Value from the dropdown to show the available values.
Check weather to select and click Apply.
Now the event table shows only the weather data.
filter (not is_null(EXTRA.poller_type)) and (string(EXTRA.poller_type) = "weather")
For more about the OPAL query language, see OPAL — Observe Processing and Analysis Language or continue to the next sections of this tutorial.
Extract fields from the JSON payload¶
With the table filtered to just the weather data, extract some fields of interest from
Open the menu for the FIELDS column and choose Extract from JSON
In the right rail, select the following fields. Some are inside nested objects:
dt: time the observation was recorded
id: unique ID
name: location, in this case, the city name
main.temp: current temperature, in Celsius
sys.country: country the city is located in
weather.description: text description of the current conditions
Click Apply to save.
Add the following OPAL statement in the console:
// Extract from the JSON payload in FIELDS make_col dt:int64(FIELDS.dt), id:int64(FIELDS.id), temp:float64(FIELDS.main.temp), country:string(FIELDS.sys.country), description:string(FIELDS.weather.description)
Click Run to apply.
Tidy as we go: set the desired timestamp and remove unneeded fields¶
We have the right fields, but there are a few additional things to adjust before we are ready to continue. These actions are done with OPAL, so if the console isn’t already open in your worksheet, click on Console to open.
The time the observation was ingested is in
BUNDLE_TIMESTAMP, but that isn’t actually the time it was recorded by the sensors. That timestamp is in
dt, as epoch time in seconds. To use this as the timestamp, convert it from type
integer to type
from_seconds(). While we are at it, we can also give some fields more useful names.
To do this, add the following OPAL statement in the console:
// Create a new field of type timestamp, converting the epoch time in dt make_col timestamp:from_seconds(int64(dt))
Next, we need to tell Observe this is our new timestamp field. Do this with the
set_valid_from verb, which marks it as the “Valid From”, or the time this observation’s data first became valid:
// Use the field "timestamp" as our timestamp, by setting as the valid from time // For better performance, don't accept a new value too far off from the // the original ingest time we are replacing. See set_valid_from docs for details. set_valid_from options(max_time_diff:duration_min(5)), timestamp
While we are here, we can also call the field
name something more descriptive:
// rename the existing name field to city rename_col city:name
There are also several fields we don’t need anymore. For best performance, it’s good practice to drop fields you don’t need. Remove them with one of these methods:
// Remove these fields from the data drop_col BUNDLE_TIMESTAMP, OBSERVATION_KIND, FIELDS, EXTRA, BUNDLE_ID, OBSERVATION_INDEX, dt
// Keep these fields and drop all others pick_col timestamp, id, city, description, temp, country
Observations must still have a valid timestamp, so don’t drop your timestamp field. The UI won’t show the Delete column menu item for that field, and the OPAL console displays an error if you try to do it with
When you are done, your OPAL script should look something like this:
// Filter to just the weather data // Use not is_null() to ensure you only get rows where poller_type exists, // and string() so we can compare its value to the string "weather" filter (not is_null(EXTRA.poller_type)) and (string(EXTRA.poller_type) = "weather") // Extract desired fields from the JSON payload, contained in FIELDS // Also rename name to city make_col dt:int64(FIELDS.dt), id:int64(FIELDS.id), city:string(FIELDS.name), description:string(FIELDS.weather.description), temp:float64(FIELDS.main.temp), country:string(FIELDS.sys.country) // Use the field "timestamp" as our timestamp, by setting it as the valid from time // For better performance, don't accept a new value too far off from the // the original ingest time we are replacing. See set_valid_from docs for details. make_col timestamp:from_seconds(int64(dt)) set_valid_from options(max_time_diff:duration_min(5)), timestamp // Select only the fields we want, dropping others // Note: pick_col must include a valid timestamp pick_col timestamp, id, city, description, temp, country
This OPAL script combines the UI actions and OPAL statements described above. It narrows all the events in the Observation Event Stream to just weather observations, and then shapes them into useful fields.
A future tutorial will cover additional options for filtering and visualizing data in a Worksheet. Below is an example, a graph of temperatures for the cities in our data (click to enlarge.)
Save your shaping work as a new dataset¶
A table of individual weather details is useful, but at the moment it only exists in this Worksheet. To use this data in other ways, such as creating metrics, it needs to be its own dataset. To do that, publish it as a new event stream:
In the right rail, click Publish New Event Stream. (If you don’t see it, you may have a cell selected in the table. Click the X to return to the default right rail view.)
Name this dataset “My Weather Test/Weather Events”.
Including a package name like “My Weather Test” creates a section My Weather Test in the Explore tab. Use packages to group related datasets together, making it easier to find them later. (If someone else already has a My Weather Test package, use another unique name.)
If you would like to pause here, save this Worksheet to continue with it later. You may also prefer to change the name to something more meaningful:
Click on the Worksheet name, Observation, at the top of the page. (The second “Observation” is the name of the stage. More about stages later.)
Type a new name and click Save.
When you are ready to continue, look for this Worksheet in the Explore tab under Worksheets.
Create Resources for observation locations¶
The next step is to create a new resource set, containing a resource for each location. This gives us a Landing Page, and a more convenient way to view the data by city.
To create Resources and publish them as a Resource Set:
In your Worksheet, open the Actions menu at the bottom of the right rail.
Select Create New Resource Set. (If you don’t see this option, you may need to close the Cell Selected view first.)
Select the fields to include, which in this case is all of them.
idas the Primary Key. The Primary Key defines which field (or fields) uniquely identify each Resource. In this data, each location (city) has a unique
id. We will use this later to link datasets.
Accept the default Resource Lifetime of 1 hour. This defines how long to wait between updates before this Resource is considered inactive.
Creating a new Resource Set adds a second stage to the worksheet, with one row for each city (or
id) in the data. Drag the time scrubber to see the weather for the past few hours:
Video: using the time scrubber
Stages are not datasets themselves, but a temporary view of the data with whatever actions have been taken so far. They inherit the state of the parent stage, so our
id resource stage builds on the extracted fields and other shaping work we did to create the Weather Events dataset. Multiple stages in a Worksheet show useful data, but in order to link those results to other datasets, we need to finish creating this new Resource dataset.
Before we do, there are a few things that will make this dataset easier to use.
Change the Resource Set name
You should have two stages in this Worksheet: your original one, and a second called “id”. This name is automatically generated from the primary key, but that might not be very meaningful. In the right rail, hover to see the pencil icon, and click to change it to “My Weather Test/Locations”. Not only is “Locations” a more useful name than “id”, but this also adds it to the My Weather Test package. (Note that package and dataset names are case-sensitive.)
Change nanosecond time period to hours
make_resourcestatement in the console shows the expiry time in nanoseconds. To use hours instead, replace
expiry:duration_hr(1). While this doesn’t change the expiry time, it makes the OPAL easier to read.
Construct a more convenient label
Add the line
label:strcat(name, ", ", country),before
primary_key(id)to construct the label text. Then add
set_label labelto use this text for the observation’s label.
When you are done, your OPAL script should look like this:
make_resource options(expiry:duration_hr(1)), timestamp: timestamp, city: city, description: description, temp: temp, country: country, label:strcat(city, ", ", country), primary_key(id) set_label label
Click Run to make sure everything works. When it looks good, click Publish New Resource Set to save.
Now we can go back to the Explore tab and open the Landing Page for the My Weather Test/Locations Resource Set. The default board displays weather details, which may be filtered by city, country, or weather description.
This shows basic information and simple visualizations, but what we don’t have yet are temperature metrics for the individual cities.
Create weather metrics¶
A metric is a numeric measurement that changes over time, such as a temperature reading. They can be displayed in charts, used to trigger alerts, or to calculate other values.
For this example, we will create two temperature metrics: one for the original Celsius reading, and one for the calculated Fahrenheit equivalent. This builds on the Weather Events dataset we created previously, containing the raw values for each weather observation. Metrics operations use OPAL, so most of this section is done in the OPAL console.
To use the Weather Events data as metrics, we need to shape the observations into a more appropriate form. (Metrics, including how and why to shape data in a particular way, are covered in more detail in Introduction to Metrics.)
Here is a summary of the process:
Start with a Worksheet for the Weather Events dataset, which has one row for each set of measurements. These are “wide metrics:” easy to read, but not as easy to perform calculations with.
Using OPAL verbs and functions, shape these events into a series of new events: one for each value of our future metric. These are “narrow” metrics.
Add an interface to identify this dataset as containing metrics, which enables additional metric operations in Observe.
make_colto create a new field of type
object. It contains the two metric values from each observation: the original Celsius temperature reading, and the same value converted to Fahrenheit.
make_col metrics:make_object( "temperature_c":temp, "temperature_f":(temp*9/5 + 32) )
flatten_leavesto create two events, one for each temperature value.
pick_colto select the needed fields from the new events.
While it’s true we can delete the one unneeded field using
drop_col, by using
pick_col we can also rename the generated
_c_ fields to something more useful.
pick_col valid_from:timestamp, id, city, country, metric_name:string(_c_metrics_path), metric_value:float64(_c_metrics_value)
interfaceto identify this dataset as containing metrics. This also specifies which columns contain the names and the values.
interface "metric", metric:metric_name, value:metric_value
Click Run to confirm everything is working.
When you are done, your OPAL should look something like this:
make_col metrics:make_object( "temperature_c":temp, "temperature_f":(temp*9/5 + 32) ) flatten_leaves metrics pick_col valid_from:timestamp, id, city, country, metric_name:string(_c_metrics_path), metric_value:float64(_c_metrics_value) interface "metric", metric:metric_name, value:metric_value
The resulting table contains two rows for each temperature reading:
The last step is to link our primary key
id to the corresponding Resource dataset we created earlier.
Open the menu for the
idcolumn and select Link to Resource Set.
Select “Locations” from the sub-menu.
In the right rail, change the auto-generated Link Name to “Locations”.
This adds a
set_link line at the end of the existing OPAL script. Publish this Worksheet as a new Event Stream named “My Weather Test/Weather Metrics”.
View your new metrics¶
Now we have all the pieces in place to display our metrics. Go to the Locations Landing Page and there’s a new section showing Temp C and Temp F.
Add metrics to a custom board¶
The default board shows our temperature metrics, but also a few other things we might not be interested in. Create a personal custom board to show just the items of interest:
Click Locations Overview to open the Boards dialog.
Click Create Board.
Drag cards from the right rail to add items to your board. You can create collapsible sections, add images, and more. When you are happy with the results, click the More icon to give it a name, and then to save.