Tutorial: Modeling Weather Data¶
This tutorial shapes weather data using a combination of UI actions and OPAL statements. You use a combination of the UI and OPAL, performing some actions in the UI and some with OPAL. Also, not every OPAL operation has an equivalent in the UI.
To become familiar with Observe terminology and concepts, you may want to review the Observe Concepts page.
For this tutorial, you create new datasets for the incoming data, and use Worksheets to extract fields and create metrics, and display data on a dashboard.
You will cover the following topics in this tutorial:
OpenWeather Data¶
Each weather observation contains the current conditions for a single location, with common measurements such as temperature, humidity, wind speed, and more. You can find an example of a JSON observation in the OpenWeather documentation.
To begin modeling weather data, install the OpenWeather app. You can find the latest version on your Observe instance’s Applications page.
Figure 1 - The OpenWeather App
To install the app, follow these steps:
From the left navigation bar, click Applications.
Locate the OpenWeather app card, and either click on the install button or click on the card.
Click Install and select the “Recommended (Easy)” option. This should take a few seconds to complete.
After the OpenWeather app successfully installs, you will see the app’s content, and other information.
Figure 2 - The OpenWeather App details
The OpenWeather app bundles four pre-defined datasets:
openweather/City
openweather/Country
openweather/Metrics
openweather/Raw Events
These datasets are derived from a Datastream. When you ingest OpenWeather data, the data initially goes into the Default Datastream. This Datastream contains all the raw, unshaped data ingested into your workspace by the OpenWeather app. The app also includes a Poller that pulls new observations (events) to the datastream regularly by polling the OpenWeather API. We will take the raw data from the Datastream and shape it into distinct Datasets.
Figure 3 - The Default Datastream with OpenWeather data
Identifying the Data of Interest¶
The OpenWeather app presents collected weather information in the Raw Events Dataset. We will use this dataset to get acquainted with Worksheets.
Navigate to the Datasets page
Search for “openweather”
In the row with the “Raw Events” Dataset, select the “Open In New Worksheet” option. If you’ve already opened the Raw Events dataset, you can click “Create Worksheet” at the top right.
Figure 4 - Searching For Datasets
Once your new Worksheet is open, you should see a timestamp
and data
column.
Figure 5 - A new Worksheet
From the data column header, click the dropdown arrow.
Select Filter from the dropdown to display the available values.
Figure 6 - Using Worksheet Column Actions
3. Select weather from the Filter Rail to filter observations with only weather data and not empty fields.
Figure 7 - Apply the weather
filter to the data
column
4. Click Apply. In the Filter bar, OPAL console, you should now see:
data.weather != null
The UI generates this for you. If you click the OPAL button to open the Console, you will see that OPAL has also been generated for this filtering task.
filter (not (is_null(data.weather) or data.weather = parse_json("null")))
Click Builder to return to the Builder view.
Figure 8 - UI Generated OPAL
Extracting Fields from the JSON payload¶
Note
You must perform the JSON extraction before running the rest of the OPAL script. The rest of the OPAL script performs actions on the extracted columns.
With the table filtered to just observations with the JSON key weather
in it, you want to extract fields of interest from data
.
Open the menu for the data column and choose Extract from JSON.
In the right menu, select the following fields. Notice that the
sys
andweather
objects must be expanded to reach the nested fields that you want to select:
dt
: recorded time of the observationid
: unique IDmain.temp
: current temperature, in Celsiusname
: location, in this case, the city namesys.country
: country location of the cityweather[0].description
: text description of the current conditions
3. Keep the Automatically convert column type checkbox selected.
4. Click Apply to extract the data. You should now see new columns with the extracted data in your Worksheet.
If you don’t have the console open on your worksheet, click OPAL at the bottom of your worksheet.
Append the following OPAL statement in the console to the existing filter
statement:
// Extract from the JSON payload in data
make_col dt:int64(data.dt),
id:int64(data.id),
name:string(data.name),
temp:float64(data.main.temp),
country:string(data.sys.country),
description:string(data.weather[0].description)
Click Run to apply.
Refining the worksheet¶
With your fields freshly extracted, you now need to adjust a few additional things before you continue. These actions must be performed directly via OPAL. Click on the OPAL button to switch from Builder mode to OPAL Console.
First , you will create a new column called EventTime from the extracted
dt
column, by using Observe’sfrom_seconds
function. This converts the event’s Epoch time (seconds since the UNIX epoch) into an Observe event time (nanoseconds since the UNIX epoch). Add the following OPAL to the existing script on the Console and click Run:
You should now see a new column called EventTime
that has been parsed from its epoch representation in the dt
column. Because Observe recognizes this field as a time, it is presented in the browser’s locale format and timezone.
2. Designate our EventTime
column as our new timestamp, using the following OPAL:
// Set the valid_from time
set_valid_from options(max_time_diff:duration_min(5)), EventTime
This now sets the newly created field EventTime as your event timestamp, by using the set_valid_from
OPAL verb. Note that the OpenWeather dataset’s dt
time values are close to the poller ingestion time seen in the raw dataset. When dealing with your own data, if you need to specify a time value other than when Observe ingested it, don’t use a value that deviates too far from the original ingest time, as it can impact performance. See set_valid_from
for details.
3. Rename the field name
to city
as a more descriptive header. Add the following OPAL to the existing script on the OPAL console and click Run:
// Rename the existing name field to city
rename_col city:name
Figure 11 - Using the rename_col OPAL verb
You don’t need the data
or dt
columns any longer, so you can delete the columns from the Worksheet. For the best performance, consider it a good practice to remove columns you don’t need. Remove the columns with one of these methods.
Note
The dataset must have a valid timestamp, so don’t drop the timestamp
field. The UI won’t show the Delete column menu item for that field, and the OPAL console displays an error if you try to do it with col_drop
or col_pick
.
Select Delete Column from the column menu of the field to delete.
Figure 12 - Using the Delete column option from the dropdown menu
There are two ways to remove fields in OPAL. Drop the ones you don’t want with drop_col
, or choose the ones you do want with pick_col
.
Using drop_col
to remove the column:
// Remove this field from the data
drop_col data, dt
Using pick_col
to select the relevant columns:
// Keep these fields and drop all others
pick_col EventTime, timestamp, id, city, description, temp, country
Congratulations! You have now narrowed down all events in the Raw Event dataset to just the weather observations and shaped them into useful fields!
The resulting Worksheet should look like this. Note that in this case drop_col
was used instead of pick_col
.
Figure 13 - Finished OPAL with drop_col
The resulting OPAL script looks like this. Note that there is a commented section if you would like to try pick_col
.
// Filter our events down to just those containing the weather key
filter (not is_null(data.weather))
// Extract desired fields from the JSON payload, contained in the data column
make_col dt:int64(data.dt),
id:int64(data.id),
temp:float64(data.main.temp),
name:string(data.name),
country:string(data.sys.country),
description:string(data.weather[0].description)
// Create a new field of type EventTime, converting the epoch time in dt
make_col EventTime:from_seconds(dt)
// Set the valid_from time
set_valid_from options(max_time_diff:duration_min(5)), EventTime
// Rename the existing name field to city
rename_col city:name
// drop columns we dont want
// Remove this field from the data
drop_col data, dt
// alternatively, we can use pick_col to select only the columns we want
// pick_col EventTime, timestamp, id, city, description, temp, country
Next, let’s prepare to visualize the data. Add
// Calculate the average temperature over time per city
timechart avg_temp:avg(temp), group_by(city)
to the OPAL console and click Run.
You can now display your data as a graph of temperatures for the cities in your data by changing the visualization type at the top center. It is currently set to Table; change it to Line Chart. Use the Chart Settings option on the right to customize choices such as unit size, legends, and colors. See Visualization Types for more detail. For now:
Click Chart Settings
Scroll to Axes and set Unit to
celsius (C, ℃, degC)
, X Axis Label to “Time”, and Y Axis Label to “Degrees”Scroll to Thresholds and turn Show Thresholds on. Set Draw as to “Filled regions & dashed lines”. Put “20” in the first value, add another and set it to 22. Use the color pills on the right to set the top band to red, the bottom band to blue, and the middle band to green.
Figure 14 - Visualize the temperature data by city
Saving Your Shaping Work as a New Dataset¶
While the table of individual weather details may be useful, the data only exists in this Worksheet at the moment. To use this data in other ways, such as creating metrics, you need to publish it as a new Event Dataset.
Remove the visualization from your dataset by changing the visualization type at the top center from Line Chart to Table.
Comment the
timechart
line:
// Calculate the average temperature over time per city
//timechart avg_temp:avg(temp), group_by(city)
Name this Query
My Weather Tutorial/Weather Events
Click Publish New Dataset in the left Rail Query context menu.
Click Publish.
Figure 15 - Dataset My Weather Tutorial/Weather Events
By prefixing My Weather Tutorial/
to the name of the dataset, it allows a natural namespacing in Observe lister pages. These are colloquially referred to as package
names, and allows you to group related datasets, making it easier to find them later.
Figure 16 - Dataset My Weather Tutorial/Weather Events
You can also save the Worksheet as well, but keep the tab with your Worksheet open, as we will use it later on.
Click on the Worksheet name, Untitled Worksheet, at the top of the page.
Type a new name and click Save worksheet.
Figure 17 - Saving a Worksheet
Note
Similar to Datasets, there is a Worksheets lister page on the left navigation, where you can find your newly created Dataset My Weather Tutorial/Weather Events
Worksheet.
Creating Resources for Observation Locations¶
Now that you’ve seen how Observe can shape your raw data, and even turn it into custom Datasets, Worksheets, and visualizations; the real power of Observe is in the concept of Resources
. Resources
, or Resource Datasets
specifically, are best thought of as objects in your environment whose state changes over time, but the object itself is persistent.
To create our Resource and publish them as a Resource Dataset, use the following steps:
Open your
Dataset My Weather Tutorial/Weather Events
Worksheet and clone the first stage
Figure 18 - Cloning A Worksheet
Paste the following OPAL into the OPAL editor
// Use the make_resource verb to create a Resource Dataset
// and set its expiry to 1 hour
make_resource options(expiry:duration_hr(1)),
timestamp: timestamp,
city: city,
description: description,
temp: temp,
country: country,
// Create an easier to read label
label:concat_strings(city, ", ", country),
// primary key is how we link to other datasets
primary_key(id)
Note
The Primary Key defines which field or fields uniquely identify each Resource. In this data, each city has a unique id
. You use this later to link datasets.
2. Click Run to run the OPAL. You should see a table similar to the below:
Figure 19 - Creating a Resource with OPAL
3. Click the ellipses (...
) next to the worksheet preview pane on the left-hand side and select Publish dataset
>Publish new dataset
. When the Publish Dataset
modal pops up, give your new Resource Dataset the name My Weather Tutorial/Locations
.
Note
Package and dataset names are case-sensitive.
Figure 20 - Publish A Resource Dataset
Now you can go back to the Datasets area, and open the My Weather Tutorial/Locations Resource Set.
Figure 21 - Dataset lister page highlighting the new Location Resource Dataset
The default Resource dashboard displays weather details, which may be filtered by city, country, or weather description. This shows basic information and simple visualizations. The following section describes creating temperature metrics for individual cities.
Figure 22 - Locations Resource Dashboard
Creating Weather Metrics¶
A metric is a numeric measurement that changes over time. Our OpenWeather data’s temperature readings are ideal examples of metrics. They can be displayed in charts, used to trigger alerts, or calculate other values.
For this example, you will create two temperature metrics:
Original Celsius reading
Calculated Fahrenheit equivalent
This builds on the Weather Events dataset you created previously, containing the raw values for each weather observation. Metrics operations use OPAL, so most of this section is done in the OPAL console.
To use the Weather Events data to create metrics, you need to shape the observations into a more appropriate form. See Introduction to Metrics for more details.
Summary of the Metrics process¶
Review the Weather Events dataset. Recall that it has one row for each set of measurements.
Using OPAL verbs and functions, shape these events into a series of new events, one for each value of your metric.
Add an interface to identify this dataset as containing metrics, which enables additional metric operations in Observe.
To start, open the Weather Events dataset in a new worksheet.
Click the Console icon at the bottom of the page to open the OPAL console.
Use
make_col
to create a new field of typeobject
. It contains the two metric values from each observation:the original Celsius temperature reading
the value converted to Fahrenheit
make_col metrics:make_object(
"temperature_c":temp,
"temperature_f":(temp*9/5 + 32)
)
4. Use flatten_leaves
to create two events, one for each temperature value.
flatten_leaves metrics
5. Use pick_col
to select the columns, and rename the generated _c_
fields to something more useful.
pick_col EventTime,
id,
city,
country,
name:string(_c_metrics_path),
value:float64(_c_metrics_value)
6. Define an interface
to identify this dataset as containing metrics. This also specifies which columns contain the names and the values.
interface "metric",
metric:name,
value:value
8. Click Run to confirm everything works. When you finish, your OPAL should look like this:
make_col metrics:make_object(
"temperature_c":temp,
"temperature_f":(temp*9/5 + 32))
flatten_leaves metrics
pick_col EventTime,
id,
city,
country,
name:string(_c_metrics_path),
value:float64(_c_metrics_value)
interface "metric",
metric:name,
value:value
The resulting table contains two rows for each temperature reading:
Figure 23 - Metrics with two rows for temperature
Linking Datasets Together¶
Observe provides the ability to link different datasets together, enabling you to create rich correlations. In this section you will link the id
column of our metrics data, to the primary key id
in the corresponding Locations Resource Dataset you created earlier. You will do this again using OPAL. However, first you will add a new Data Input
by selecting Manage Inputs
in the top right. Locate your Location
Resource dataset we just crated and make the name Locations
.
Note
Adding the input via the UI allows you to address the dataset by its human readable name, rather than it’s dataset ID value.
Figure 24 - Add a dataset input
The OPAL for creating a link is very simple, you provide a dataset and associated column that you wish to join together. Values that match between the specified keys will be joined or “linked” together.
// set_link uses a column in the current dataset
// and the @ operatator to address the dataset name and column you with
// to join on
set_link id:@Locations.id
Figure 25 - Linking id
to Locations Resource set
Publish this Worksheet as a new Event Dataset. Give this dataset the name My Weather Tutorial/Weather Metrics.
Dashboarding Your Weather Metrics¶
Starting with our new Weather Metrics
Dataset, let’s open it as a Worksheet. You can do this by selecting the “Open In New Worksheet” in the Datasets lister page.
Figure 26 - Open Dataset in new Worksheet
Click the Visualize icon.
Select
temperature_f
from the Metrics list.
In the Expression Builder, add the following parameters:
Plot =
Average
By =
city
You can use the Where
option to also filter down to specific Metric values or names as well. Observe even provides summary values from your data to select from.
Figure 27 - Metrics Expression Builder Where Options
4. To create a different type of visualization, click the Visualize tab and select a different type from the Type list. Experiment with different types to see how the display changes with each type of graph.
5. Click the Presentation tab to change the presentation layout. You can add legends, flip your x- and y-axes, change colors, and customize the appearance.
6. To create a dashboard from the worksheet, click Export as Dashboard near Save worksheet.
For more information on using Dashboards, see Creating and using dashboards.
Figure 28 - Metrics dashboard with Design and Definition options
This tutorial taught you how Observe ingests data from an external source, in this case an OpenWeather API. You also learned to filter and shape the ingested data into datasets and how datasets link together for business. You produced metrics related to the observations in the datasets and added them to a dashboard bringing together metrics and events into useful interactive pages.