Tutorial: modeling weather data¶
This tutorial shapes weather data using a combination of UI actions and OPAL statements. Many people like to mix and match, performing some actions in the UI and some with OPAL. Also, not every OPAL operation has an equivalent in the UI. So it’s useful to be comfortable with both.
The data comes from OpenWeather, via a process that periodically requests current weather observations for a selection of cities. This poller is not currently user-configurable. If you would like to follow along with live data, contact us about configuring this data input in your workspace.
The OpenWeather data¶
Each weather observation contains the current conditions for a single location, with common measurements such as temperature, humidity, wind speed, and more. You can find an example of a JSON observation in the OpenWeather documentation.
For this tutorial, you create new datasets for the incoming data, extract fields and create metrics, and display data on a dashboard.
Here is what you’ll cover:
Identify the data of interest¶
When you ingest data, it initially goes into the Observation event dataset. This dataset contains everything ingested into your workspace, so that could be quite a lot. (If you are using data streams, your data may be in a data stream dataset instead. See Data streams for details.)
The first step for a new input is to identify the data of interest and create a dataset for it. Do this with a new Worksheet.
Open a Worksheet with the Observation event dataset:
From the Workplace Settings tab, click Data streams and search for OpenWeather.
Click on Raw events.
Click Open dataset.
To view just the weather data, filter by something unique to this source. In this case, the weather data has an
api.openweather.org field in the
5. Open the menu for the EXTRA column, either with the down arrow or by right-clicking the column header.
6. Select Filter from the menu.
7. Search for
host and check the box to select it. (If you don’t find it, try selecting a longer time range using the time picker.)
Also, you can filter on the value
api.openweathermap.org. These two things uniquely identify the weather data:
8. Select Value from the dropdown to show the available values.
9. Check api.openweathermap.org to select and click Apply.
Now the event table shows only the weather data.
filter (not is_null(EXTRA.host)) and (string(EXTRA.api.openweathermap.org) = "weather")
For more about the OPAL query language, see OPAL — Observe Processing and Analysis Language or continue to the next sections of this tutorial.
Extract fields from the JSON payload¶
With the table filtered to just the weather data, extract some fields of interest from
Open the menu for the data column and choose Extract from JSON
In the right menu, select the following fields. Some are inside nested objects:
dt: time the observation was recorded
id: unique ID
name: location, in this case, the city name
main.temp: current temperature, in Celsius
sys.country: country the city is located in
weather.description: text description of the current conditions
Click Apply to save.
Add the following OPAL statement in the console:
// Extract from the JSON payload in FIELDS make_col dt:int64(FIELDS.dt), id:int64(FIELDS.id), temp:float64(FIELDS.main.temp), country:string(FIELDS.sys.country), description:string(FIELDS.weather.description)
Click Run to apply.
Tidy as you go: set the desired timestamp and remove unneeded fields¶
You have the right fields, but there are a few additional things to adjust before you are ready to continue. These actions are done with OPAL, so if the console isn’t already open in your worksheet, click on Console to open.
The time the observation was ingested is in
BUNDLE_TIMESTAMP, but that isn’t actually the time it was recorded by the sensors. That timestamp is in
dt, as epoch time in seconds. To use this as the timestamp, convert it from type
integer to type
from_seconds(). While you are at it, you can also give some fields more useful names.
To do this, add the following OPAL statement in the console:
// Create a new field of type timestamp, converting the epoch time in dt make_col timestamp:from_seconds(int64(dt))
Next, you need to tell Observe this is our new timestamp field. Do this with the
set_valid_from verb, which marks it as the “Valid From”, or the time this observation’s data first became valid:
// Use the field "timestamp" as your timestamp, by setting as the valid from time // For better performance, don't accept a new value too far off from the // the original ingest time you are replacing. See set_valid_from docs for details. set_valid_from options(max_time_diff:duration_min(5)), timestamp
While you are here, you can also call the field
name something more descriptive:
// rename the existing name field to city rename_col city:name
There are also several fields you don’t need anymore. For best performance, it’s good practice to drop fields you don’t need. Remove them with one of these methods:
// Remove these fields from the data drop_col BUNDLE_TIMESTAMP, OBSERVATION_KIND, FIELDS, EXTRA, BUNDLE_ID, OBSERVATION_INDEX, dt
// Keep these fields and drop all others pick_col timestamp, id, city, description, temp, country
Observations must still have a valid timestamp, so don’t drop your timestamp field. The UI won’t show the Delete column menu item for that field, and the OPAL console displays an error if you try to do it with
When you are done, your OPAL script should look something like this:
// Filter to just the weather data // Use not is_null() to ensure you only get rows where poller_type exists, // and string() so you can compare its value to the string "weather" filter (not is_null(EXTRA.poller_type)) and (string(EXTRA.poller_type) = "weather") // Extract desired fields from the JSON payload, contained in FIELDS // Also rename name to city make_col dt:int64(FIELDS.dt), id:int64(FIELDS.id), city:string(FIELDS.name), description:string(FIELDS.weather.description), temp:float64(FIELDS.main.temp), country:string(FIELDS.sys.country) // Use the field "timestamp" as your timestamp, by setting it as the valid from time // For better performance, don't accept a new value too far off from the // the original ingest time you are replacing. See set_valid_from docs for details. make_col timestamp:from_seconds(int64(dt)) set_valid_from options(max_time_diff:duration_min(5)), timestamp // Select only the fields you want, dropping others // Note: pick_col must include a valid timestamp pick_col timestamp, id, city, description, temp, country
This OPAL script combines the UI actions and OPAL statements described above. It narrows all the events in the Observation Event Stream to just weather observations, and then shapes them into useful fields.
A future tutorial may cover additional options for filtering and visualizing data in a Worksheet. Below is an example, a graph of temperatures for the cities in your data (click to enlarge.)
Save your shaping work as a new dataset¶
A table of individual weather details is useful, but at the moment it only exists in this Worksheet. To use this data in other ways, such as creating metrics, it needs to be its own dataset. To do that, publish it as a new event stream:
In the right menu, click Publish New Event Stream. (If you don’t see it, you may have a cell selected in the table. Click the X to return to the default right rail view.)
Name this dataset “My Weather Test/Weather Events”.
Including a package name like “My Weather Test” creates a section My Weather Test in the Explore tab. Use packages to group related datasets together, making it easier to find them later. (If someone else already has a My Weather Test package, use another unique name.)
If you would like to pause here, save this Worksheet to continue with it later. You may also prefer to change the name to something more meaningful:
Click on the Worksheet name, Observation, at the top of the page. (The second “Observation” is the name of the stage. More about stages later.)
Type a new name and click Save.
When you’re ready to continue, look for this Worksheet in the Explore tab under Worksheets.
Create Resources for observation locations¶
The next step is to create a new resource set, containing a resource for each location. This gives us a Landing Page, and a more convenient way to view the data by city.
To create Resources and publish them as a Resource Set:
In your Worksheet, open the Actions menu at the bottom of the right menu.
Select Create New Resource Set. (If you don’t see this option, you may need to close the Cell Selected view first.)
Select the fields to include, which in this case is all of them.
idas the Primary Key.
The Primary Key defines which field (or fields) uniquely identify each Resource. In this data, each location (city) has a unique
id. You use this later to link datasets.
Accept the default Resource Lifetime of 1 hour. This defines how long to wait between updates before this Resource is considered inactive.
Creating a new Resource Set adds a second stage to the worksheet, with one row for each city (or
id) in the data. Drag the time scrubber to see the weather for the past few hours:
Video: using the time scrubber
Stages are not datasets themselves, but a temporary view of the data with whatever actions have been taken so far. They inherit the state of the parent stage, so your
id resource stage builds on the extracted fields and other shaping work you did to create the Weather Events dataset. Multiple stages in a Worksheet show useful data, but in order to link those results to other datasets, you need to finish creating this new Resource dataset.
Before you do, there are a few things that make this dataset easier to use.
Change the Resource Set name
You should have two stages in this Worksheet: your original one, and a second called “id”. This name is automatically generated from the primary key, but that might not be very meaningful. In the right menu, hover to see the pencil icon, and click to change it to “My Weather Test/Locations”. Not only is “Locations” a more useful name than “id”, but this also adds it to the My Weather Test package. (Note that package and dataset names are case-sensitive.)
Change nanosecond time period to hours
make_resource statement in the console shows the expiry time in nanoseconds. To use hours instead, replace
expiry:duration_hr(1). While this doesn’t change the expiry time, it makes the OPAL easier to read.
Construct a more convenient label
Add the line
label:strcat(name, ", ", country),before
primary_key(id)to construct the label text. Then add
set_label labelto use this text for the observation’s label.
After you finish, your OPAL script should look like this:
make_resource options(expiry:duration_hr(1)), timestamp: timestamp, city: city, description: description, temp: temp, country: country, label:strcat(city, ", ", country), primary_key(id) set_label label
Click Run to make sure everything works. When it looks good, click Publish New Resource Set to save.
Now you can go back to the Explore tab and open the Landing Page for the My Weather Test/Locations Resource Set. The default board displays weather details, which may be filtered by city, country, or weather description.
This shows basic information and simple visualizations, but what you don’t have yet are temperature metrics for the individual cities.
Create weather metrics¶
A metric is a numeric measurement that changes over time, such as a temperature reading. They can be displayed in charts, used to trigger alerts, or calculate other values.
For this example, you create two temperature metrics: one for the original Celsius reading, and one for the calculated Fahrenheit equivalent. This builds on the Weather Events dataset you created previously, containing the raw values for each weather observation. Metrics operations use OPAL, so most of this section is done in the OPAL console.
To use the Weather Events data as metrics, you need to shape the observations into a more appropriate form. (Metrics, including how and why to shape data in a particular way, are covered in more detail in Introduction to Metrics.)
Here is a summary of the process:
Start with a Worksheet for the Weather Events dataset, which has one row for each set of measurements. These are “wide metrics:” easy to read, but not as easy to perform calculations with.
Using OPAL verbs and functions, shape these events into a series of new events: one for each value of your future metric. These are “narrow” metrics.
Add an interface to identify this dataset as containing metrics, which enables additional metric operations in Observe.
make_colto create a new field of type
object. It contains the two metric values from each observation: the original Celsius temperature reading, and the same value converted to Fahrenheit.
make_col metrics:make_object( "temperature_c":temp, "temperature_f":(temp*9/5 + 32) )
flatten_leavesto create two events, one for each temperature value.
pick_colto select the needed fields from the new events.
While it’s true you can delete the one unneeded field using
drop_col, by using
pick_colyou can also rename the generated
_c_fields to something more useful.
pick_col valid_from:timestamp, id, city, country, metric:string(_c_metrics_path), value:float64(_c_metrics_value)
interfaceto identify this dataset as containing metrics. This also specifies which columns contain the names and the values.
interface "metric", metric:metric, value:value
Observe recommends naming metric name columns
metricand metric value columns
value. If your dataset contains these columns, you may omit them in your
// Rename existing columns to "metric" and "value" rename_col metric:string(_c_metrics_path), value:(_c_metrics_value) // Automatically discover metrics that follow naming convention: // "metric" contains names of individual metrics // "value" contains the values interface "metric"
Click Run to confirm everything is working. When you are done, your OPAL should look something like this:
make_col metrics:make_object( "temperature_c":temp, "temperature_f":(temp*9/5 + 32) ) flatten_leaves metrics pick_col valid_from:timestamp, id, city, country, metric:string(_c_metrics_path), value:float64(_c_metrics_value) interface "metric", metric:metric, value:value
The resulting table contains two rows for each temperature reading:
The last step is to link your primary key
id to the corresponding Resource dataset you created earlier.
Open the menu for the
idcolumn and select Link to Resource Set.
Select “Locations” from the sub-menu.
In the right menu, change the auto-generated Link Name to “Locations”.
This adds a
set_link line at the end of the existing OPAL script. Publish this Worksheet as a new Event Stream named “My Weather Test/Weather Metrics”.
View your new metrics¶
Now you have all the pieces in place to display your metrics. Go to the Locations Landing Page and there’s a new section showing Temp C and Temp F.
Add metrics to a dashboard¶
The default dashboard shows your temperature metrics, but also a few unnecessary parameters. Create a custom dashboard to show just the items of interest:
Drag cards from the right menu to add items to your board. You can create collapsible sections, add images, and more. When you are happy with the results, click the More icon to give it a name, and then to save.