Tutorial: Modeling Weather Data

This tutorial shapes weather data using a combination of UI actions and OPAL statements. You use a combination of the UI and OPAL, performing some actions in the UI and some with OPAL. Also, not every OPAL operation has an equivalent in the UI.

To become familiar with Observe terminology and concepts, you may want to review the Observe Concepts page.

For this tutorial, you create new datasets for the incoming data, and use Worksheets to extract fields and create metrics, and display data on a dashboard.

You will cover the following topics in this tutorial:

OpenWeather Data

Each weather observation contains the current conditions for a single location, with common measurements such as temperature, humidity, wind speed, and more. You can find an example of a JSON observation in the OpenWeather documentation.

To begin modeling weather data, install the OpenWeather app. You can find the latest version on your Observe instance’s Applications page.

The OpenWeather app

Figure 1 - The OpenWeather App

To install the app, follow these steps:

  1. From the left navigation bar, click Applications.

  2. Locate the OpenWeather app card, and either click on the install button or click on the card.

  3. Click Install and select the “Recommended (Easy)” option. This should take a few seconds to complete.

After the OpenWeather app successfully installs, you will see the app’s content, and other information.

The OpenWeather app details

Figure 2 - The OpenWeather App details

The OpenWeather app bundles four pre-defined datasets:

  • openweather/City

  • openweather/Country

  • openweather/Metrics

  • openweather/Raw Events

These datasets are derived from a Datastream. When you ingest OpenWeather data, the data initially goes into the Default Datastream. This Datastream contains all the raw, unshaped data ingested into your workspace by the OpenWeather app. The app also includes a Poller that pulls new observations (events) to the datastream regularly by polling the OpenWeather API. We will take the raw data from the Datastream and shape it into distinct Datasets.

Default Datastream

Figure 3 - The Default Datastream with OpenWeather data

Identifying the Data of Interest

The OpenWeather app presents collected weather information in the Raw Events Dataset. We will use this dataset to get acquainted with Worksheets.

  1. Navigate to the Datasets page

  2. Search for “openweather”

  3. In the row with the “Raw Events” Dataset, select the “Open In New Worksheet” option. If you’ve already opened the Raw Events dataset, you can click “Create Worksheet” at the top right.

Open Raw Data In Worksheet

Figure 4 - Searching For Datasets

Once your new Worksheet is open, you should see a timestamp and data column.

New Raw Data Worksheet

Figure 5 - A new Worksheet

  1. From the data column header, click the dropdown arrow.

  2. Select Filter from the dropdown to display the available values.

Worksheet Column Action Menu

Figure 6 - Using Worksheet Column Actions

3. Select weather from the Filter Rail to filter observations with only weather data and not empty fields.

Apply filter to data

Figure 7 - Apply the weather filter to the data column

4. Click Apply. In the Filter bar, OPAL console, you should now see:

data.weather != null

The UI generates this for you. If you click the OPAL button to open the Console, you will see that OPAL has also been generated for this filtering task.

filter (not (is_null(data.weather) or data.weather = parse_json("null")))

Click Builder to return to the Builder view.

Result after applying filter

Figure 8 - UI Generated OPAL

Extracting Fields from the JSON payload

Note

You must perform the JSON extraction before running the rest of the OPAL script. The rest of the OPAL script performs actions on the extracted columns.

With the table filtered to just observations with the JSON key weather in it, you want to extract fields of interest from data.

  1. Open the menu for the data column and choose Extract from JSON.

  2. In the right menu, select the following fields. Notice that the sys and weather objects must be expanded to reach the nested fields that you want to select:

  • dt: recorded time of the observation

  • id: unique ID

  • main.temp: current temperature, in Celsius

  • name: location, in this case, the city name

  • sys.country: country location of the city

  • weather[0].description: text description of the current conditions

3. Keep the Automatically convert column type checkbox selected.

4. Click Apply to extract the data. You should now see new columns with the extracted data in your Worksheet.

Extract from JSON using data column

Figure 9 - Extract data using JSON

Applied extract from JSON

Figure 10 - Applied Extraction with new columns

If you don’t have the console open on your worksheet, click OPAL at the bottom of your worksheet.

Append the following OPAL statement in the console to the existing filter statement:

// Extract from the JSON payload in data
make_col dt:int64(data.dt),
  id:int64(data.id),
  name:string(data.name),
  temp:float64(data.main.temp),
  country:string(data.sys.country),
  description:string(data.weather[0].description)

Click Run to apply.

Refining the worksheet

With your fields freshly extracted, you now need to adjust a few additional things before you continue. These actions must be performed directly via OPAL. Click on the OPAL button to switch from Builder mode to OPAL Console.

  1. First , you will create a new column called EventTime from the extracted dt column, by using Observe’s from_seconds function. This converts the event’s Epoch time (seconds since the UNIX epoch) into an Observe event time (nanoseconds since the UNIX epoch). Add the following OPAL to the existing script on the Console and click Run:

// Create a new field named EventTime, converting the epoch time in dt
make_col EventTime:from_seconds(dt)

You should now see a new column called EventTime that has been parsed from its epoch representation in the dt column. Because Observe recognizes this field as a time, it is presented in the browser’s locale format and timezone.

2. Designate our EventTime column as our new timestamp, using the following OPAL:

// Set the valid_from time

set_valid_from options(max_time_diff:duration_min(5)), EventTime

This now sets the newly created field EventTime as your event timestamp, by using the set_valid_from OPAL verb. Note that the OpenWeather dataset’s dt time values are close to the poller ingestion time seen in the raw dataset. When dealing with your own data, if you need to specify a time value other than when Observe ingested it, don’t use a value that deviates too far from the original ingest time, as it can impact performance. See set_valid_from for details.

3. Rename the field name to city as a more descriptive header. Add the following OPAL to the existing script on the OPAL console and click Run:

// Rename the existing name field to city
rename_col city:name
rename_col example

Figure 11 - Using the rename_col OPAL verb

You don’t need the data or dt columns any longer, so you can delete the columns from the Worksheet. For the best performance, consider it a good practice to remove columns you don’t need. Remove the columns with one of these methods.

Note

The dataset must have a valid timestamp, so don’t drop the timestamp field. The UI won’t show the Delete column menu item for that field, and the OPAL console displays an error if you try to do it with col_drop or col_pick.

Select Delete Column from the column menu of the field to delete.

data column header menu open, showing the Delete columns menu item

Figure 12 - Using the Delete column option from the dropdown menu

There are two ways to remove fields in OPAL. Drop the ones you don’t want with drop_col, or choose the ones you do want with pick_col.

Using drop_colto remove the column:

// Remove this field from the data
drop_col data, dt

Using pick_col to select the relevant columns:

// Keep these fields and drop all others
pick_col EventTime, timestamp, id, city, description, temp, country

Congratulations! You have now narrowed down all events in the Raw Event dataset to just the weather observations and shaped them into useful fields!

The resulting Worksheet should look like this. Note that in this case drop_col was used instead of pick_col.

data column header menu open, showing the Delete columns menu item

Figure 13 - Finished OPAL with drop_col

The resulting OPAL script looks like this. Note that there is a commented section if you would like to try pick_col.

// Filter our events down to just those containing the weather key
filter (not is_null(data.weather))
// Extract desired fields from the JSON payload, contained in the data column
make_col dt:int64(data.dt),
    id:int64(data.id),
    temp:float64(data.main.temp),
    name:string(data.name),
    country:string(data.sys.country),
    description:string(data.weather[0].description)
// Create a new field of type EventTime, converting the epoch time in dt
make_col EventTime:from_seconds(dt)
// Set the valid_from time
set_valid_from options(max_time_diff:duration_min(5)), EventTime
// Rename the existing name field to city
rename_col city:name
// drop columns we dont want
// Remove this field from the data
drop_col data, dt
// alternatively, we can use pick_col to select only the columns we want
// pick_col EventTime, timestamp, id, city, description, temp, country

Next, let’s prepare to visualize the data. Add

// Calculate the average temperature over time per city
timechart avg_temp:avg(temp), group_by(city)

to the OPAL console and click Run.

You can now display your data as a graph of temperatures for the cities in your data by changing the visualization type at the top center. It is currently set to Table; change it to Line Chart. Use the Chart Settings option on the right to customize choices such as unit size, legends, and colors. See Visualization Types for more detail. For now:

  1. Click Chart Settings

  2. Scroll to Axes and set Unit to celsius (C, ℃, degC), X Axis Label to “Time”, and Y Axis Label to “Degrees”

  3. Scroll to Overlays and turn Show Thresholds on. Set Draw as to “Filled regions & dashed lines”. Put “20” in the first value, add another and set it to 22. Use the color pills on the right to set the top band to red, the bottom band to blue, and the middle band to green.

Worksheet Visualization

Figure 14 - Visualize the temperature data by city

Saving Your Shaping Work as a New Dataset

While the table of individual weather details may be useful, the data only exists in this Worksheet at the moment. To use this data in other ways, such as creating metrics, you need to publish it as a new Event Dataset.

  1. Remove the visualization from your dataset by changing the visualization type at the top center from Line Chart to Table.

  2. Comment the timechart line:

// Calculate the average temperature over time per city
//timechart avg_temp:avg(temp), group_by(city)
  1. Name this Query My Weather Tutorial/Weather Events

  2. Click Publish New Dataset in the left Rail Query context menu.

  3. Click Publish.

Dataset Publish My Weather Tutorial/Weather Events

Figure 15 - Dataset My Weather Tutorial/Weather Events

By prefixing My Weather Tutorial/ to the name of the dataset, it allows a natural namespacing in Observe lister pages. These are colloquially referred to as package names, and allows you to group related datasets, making it easier to find them later.

Dataset My Weather Tutorial/Weather Events  lister

Figure 16 - Dataset My Weather Tutorial/Weather Events

You can also save the Worksheet as well, but keep the tab with your Worksheet open, as we will use it later on.

  1. Click on the Worksheet name, Untitled Worksheet, at the top of the page.

  2. Type a new name and click Save worksheet.

Dataset My Weather Tutorial/Weather Events  lister

Figure 17 - Saving a Worksheet

Note

Similar to Datasets, there is a Worksheets lister page on the left navigation, where you can find your newly created Dataset My Weather Tutorial/Weather Events Worksheet.

Creating Resources for Observation Locations

Now that you’ve seen how Observe can shape your raw data, and even turn it into custom Datasets, Worksheets, and visualizations; the real power of Observe is in the concept of Resources. Resources, or Resource Datasets specifically, are best thought of as objects in your environment whose state changes over time, but the object itself is persistent.

To create our Resource and publish them as a Resource Dataset, use the following steps:

  1. Select Datasets on the left navigation

  2. Search for your Weather Events Dataset you just created

  3. Click on the Weather Events Dataset

  4. Choose Create Worksheet at the top of the screen You can now see that the worksheet has a starting point reading from your Weather Events Dataset

Figure 18 - Cloning A Stage

  1. Paste the following OPAL into the OPAL editor

// Use the make_resource verb to create a Resource Dataset
// and set its expiry to 1 hour
make_resource options(expiry:duration_hr(1)),
  timestamp: timestamp,
  city: city,
  description: description,
  temp: temp,
  country: country,
  // Create an easier to read label 
  label:concat_strings(city, ", ", country),
  // primary key is how we link to other datasets
  primary_key(id)

Note

The Primary Key defines which field or fields uniquely identify each Resource. In this data, each city has a unique id. You use this later to link datasets.

2. Click Run to run the OPAL. You should see a table similar to the below:

After running OPAL to create a new label column.

Figure 19 - Creating a Resource with OPAL

3. Click the ellipses (...) next to the worksheet preview pane on the left-hand side and select Publish new dataset. When the Publish Dataset modal pops up, give your new Resource Dataset the name My Weather Tutorial/Locations.

Note

Package and dataset names are case-sensitive.

After running OPAL to create a new label column.

Figure 20 - Publish A Resource Dataset

Now you can go back to the Datasets area, and open the My Weather Tutorial/Locations Resource Set.

Datasets Lister Page With New Location Resource Dataset.

Figure 21 - Dataset lister page highlighting the new Location Resource Dataset

The default Resource dashboard displays weather details, which may be filtered by city, country, or weather description. This shows basic information and simple visualizations. The following section describes creating temperature metrics for individual cities.

Locations Landing Page, showing the default Locations Overview dashboard with cards for various types of weather information.

Figure 22 - Locations Resource Dashboard

Creating Weather Metrics

A metric is a numeric measurement that changes over time. Our OpenWeather data’s temperature readings are ideal examples of metrics. They can be displayed in charts, used to trigger alerts, or calculate other values.

For this example, you will create two temperature metrics:

  • Original Celsius reading

  • Calculated Fahrenheit equivalent

This builds on the Weather Events dataset you created previously, containing the raw values for each weather observation. Metrics operations use OPAL, so most of this section is done in the OPAL console.

To use the Weather Events data to create metrics, you need to shape the observations into a more appropriate form. See Introduction to Metrics for more details.

Summary of the Metrics process

  • Review the Weather Events dataset. Recall that it has one row for each set of measurements.

  • Using OPAL verbs and functions, shape these events into a series of new events, one for each value of your metric.

  • Add an interface to identify this dataset as containing metrics, which enables additional metric operations in Observe.

  1. To start, open the Weather Events dataset in a new worksheet.

  2. Click the OPAL icon to toggle at the top right of the page to open the OPAL console.

  3. Use make_col to create a new field of type object. It contains the two metric values from each observation:

    • the original Celsius temperature reading

    • the value converted to Fahrenheit

make_col metrics:make_object(
  "temperature_c":temp,
  "temperature_f":(temp*9/5 + 32)
)

4. Use flatten_leaves to create two events, one for each temperature value.

flatten_leaves metrics

5. Use pick_col to select the columns, and rename the generated _c_ fields to something more useful.

pick_col EventTime,
 id,
 city,
 country,
 name:string(_c_metrics_path),
 value:float64(_c_metrics_value)

6. Define an interface to identify this dataset as containing metrics. This also specifies which columns contain the names and the values.

interface "metric",
  metric:name,
  value:value

8. Click Run to confirm everything works. When you finish, your OPAL should look like this:

make_col metrics:make_object(
  "temperature_c":temp,
  "temperature_f":(temp*9/5 + 32))

flatten_leaves metrics

pick_col EventTime,
  id,
  city,
  country,
  name:string(_c_metrics_path),
  value:float64(_c_metrics_value)

interface "metric",
  metric:name,
  value:value

The resulting table contains two rows for each temperature reading:

Table containing metrics, one row for each value of the two temperature metrics

Figure 23 - Metrics with two rows for temperature

Linking Datasets Together

Observe provides the ability to link different datasets together, enabling you to create rich correlations. In this section you will link the id column of our metrics data, to the primary key id in the corresponding Locations Resource Dataset you created earlier. You will do this again using OPAL. However, first you will add a new Data Input by selecting Manage Inputs in the top right. Locate your Location Resource dataset we just crated and make the name Locations.

Note

Adding the input via the UI allows you to address the dataset by its human readable name, rather than it’s dataset ID value.

Adding a new dataset input

Figure 24 - Add a dataset input

The OPAL for creating a link is very simple, you provide a dataset and associated column that you wish to join together. Values that match between the specified keys will be joined or “linked” together.

// set_link uses a column in the current dataset
// and the @ operatator to address the dataset name and column you with
// to join on
set_link id:@Locations.id
id column menu with link to Location Resource Set - Locations selected

Figure 25 - Linking id to Locations Resource set

Publish this Worksheet as a new Event Dataset. Give this dataset the name My Weather Tutorial/Weather Metrics.

Dashboarding Your Weather Metrics

Starting with our new Weather Metrics Dataset, let’s open it as a Worksheet. You can do this by selecting the “Open In New Worksheet” in the Datasets lister page.

Weather Metrics Dataset Open In New Worksheet

Figure 26 - Open Dataset in new Worksheet

  1. Click the Add to worksheet button in the top left.

  2. Choose Metric under Visualizations section

  3. In the center of the screen you will see a Metric dropdown and Select a metric to the right. Select the metric and choose Browse all

  4. Select temperature_f from the Metrics list.

In the section beneath the Filter, add the following parameters:

  • Plot = Average

  • By = city

Metrics Expression configuration

Figure 27 - Metrics Expression configuration

You can use the Where option to also filter down to specific Metric values or names as well. Observe even provides summary values from your data to select from.

Metrics Expression Builder filter options

Figure 28 - Metrics Expression Builder Where Options

4. To create a different type of visualization, click the Visualize tab and select a different type from the Type list. Experiment with different types to see how the display changes with each type of graph.

5. Click the Presentation tab to change the presentation layout. You can add legends, flip your x- and y-axes, change colors, and customize the appearance.

6. To create a dashboard from the worksheet, click Save as Dashboard in the green dropdown with Save worksheet.

For more information on using Dashboards, see Creating and using dashboards.

Metrics dashboard with design and definition options

Figure 29 - Metrics dashboard with Design and Definition options

This tutorial taught you how Observe ingests data from an external source, in this case an OpenWeather API. You also learned to filter and shape the ingested data into datasets and how datasets link together for business. You produced metrics related to the observations in the datasets and added them to a dashboard bringing together metrics and events into useful interactive pages.