Tutorial: Modeling Weather Data

This tutorial shapes weather data using a combination of UI actions and OPAL statements. You use a combination of the UI and OPAL, performing some actions in the UI and some with OPAL. Also, not every OPAL operation has an equivalent in the UI.

To become familiar with Observe terminology and concepts, you may want to review the Observe Concepts page.

For this tutorial, you create new datasets for the incoming data, extract fields and create metrics, and display data on a dashboard.

You cover the following topics in this tutorial:

OpenWeather data

Each weather observation contains the current conditions for a single location, with common measurements such as temperature, humidity, wind speed, and more. You can find an example of a JSON observation in the OpenWeather documentation.

Before you can begin modeling weather data, install the OpenWeather app. You can find the latest version on your Observe instance’s Apps page.

The OpenWeather app

Figure 1 - The OpenWeather App

To install the app, follow these steps:

  1. From the left navigation bar, click Apps Apps icon.

  2. Locate the OpenWeather app card, and click on it.

  3. Click Install and follow the installation instructions, including creating a token for the datastream.

After the OpenWeather app successfully installs, Observe displays the app details information.

The OpenWeather app details

Figure 2 - The OpenWeather App details

The OpenWeather app contains predefined datasets:

  • City

  • Country

  • Metrics

  • Raw Events

Identifying the data of interest

When you ingest OpenWeather data, the data initially goes into the Default dataset. This dataset contains all the data ingested into your workspace by the OpenWeather app. OpenWeather creates the Raw Events dataset to include only weather information. OpenWeather updates the datastream regularly.

As your first step for creating new input, identify the data of interest and create a dataset for it. Do this with a new Worksheet.

Open a Worksheet with the OpenWeather Raw Events dataset:

Raw Events dataset

Figure 3 - The OpenWeather Raw Events dataset

  1. From the Datasets list, click on Openweather/Raw Events.

  2. At the top of the Raw Events dataset, click Worksheet.

  3. In the data column, click the dropdown arrow.

  4. Select Filter from the dropdown to display the available values.

  5. Select weather to filter observations with only weather data and not empty fields. This generates the following OPAL:

filter (not is_null(data.weather))

6. Click Apply.

Apply filter to data

Figure 4 - Apply the weather filter to the data column

Extracting fields from the JSON payload

With the table filtered to just the weather data, you want to extract fields of interest from data.

  1. Open the menu for the data column and choose Extract from JSON.

  2. In the right menu, select the following fields. Some fields may be in nested objects:

  • dt: recorded time of the observation

  • id: unique ID

  • main.temp: current temperature, in Celsius

  • name: location, in this case, the city name

  • sys.country: country location of the city

  • weather[0].description: text description of the current conditions

3. Keep the Automatically convert column type checkbox selected.

4. Click Apply to extract the data.

Extract from JSON using data column

Figure 5 - Extract data using JSON

If you don’t have the console open on your worksheet, click Console Console button at the bottom of your worksheet.

Add the following OPAL statement in the console:

// Extract from the JSON payload in data
make_col dt:int64(data.dt),
  id:int64(data.id),
  name:string(data.name),
  temp:float64(data.main.temp),
  country:string(data.sys.country),
  description:string(data.weather[0].description)

Click Run to apply.

Refining the worksheet

You have the right fields, but you should adjust a few additional things before you continue. Perform these actions using OPAL. If you don’t have the console open, click on Console button Console to open.

  1. Create a new column from the dt called EventTime to convert the Epoch time into an event time. Add the following OPAL to the existing script on the Console and click Run:

// Create a new field of type EventTime, converting the epoch time in dt

make_col EventTime:from_seconds(dt)

2. Designate this as the new event time using the following OPAL:

Use the field EventTime as your timestamp, by using set_valid_from OPAL verb. For better performance, don’t use a new value that deviates too much from the original ingest time you replace. See set_valid_from for details.

Add the following OPAL to the existing script on the Console and click Run:

// Set the valid_from time

set_valid_from options(max_time_diff:duration_min(5)), EventTime

3. Rename the field name to city as a more descriptive header. Add the following OPAL to the existing script on the Console and click Run:

// Rename the existing name field to city
rename_col city:name

You don’t need the data or dt columns any longer, so you can delete the columns from the Worksheet. For the best performance, consider it a good practice to remove columns you don’t need. Remove the columns with one of these methods.

Select Delete Column from the column menu of the field to delete.

data column header menu open, showing the Delete columns menu item

Figure 6 - Using the Delete column option from the dropdown menu

There are two ways to remove fields in OPAL. Drop the ones you don’t want with drop_col, or choose the ones you do want with pick_col.

Using drop_colto remove the column:

// Remove this field from the data
drop_col data, dt

Using pick_col to select the relevant columns:

// Keep these fields and drop all others
pick_col EventTime, timestamp, id, city, description, temp, country

Note

The dataset must have a valid timestamp, so don’t drop the timestamp field. The UI won’t show the Delete column menu item for that field, and the OPAL console displays an error if you try to do it with col_drop or col_pick.

When you finish, your OPAL script looks like this:

// Extract desired fields from the JSON payload, contained in data
make_col dt:int64(data.dt),
  id:int64(data.id),
  name:string(data.name),
  description:string(data.weather[0].description),
  temp:float64(data.main.temp),
  country:string(data.sys.country)
  
// Create a new field of type EventTime, converting the epoch time in dt
make_col EventTime:from_seconds(dt)


// Set the valid_from time
set_valid_from options(max_time_diff:duration_min(5)), EventTime

// Rename the existing name column to city
rename_col city:name

// Select only the fields you want, dropping others
// Note: pick_col must include a valid timestamp
pick_col timestamp, EventTime, id, city, description, temp, country

Note

You must perform the JSON extraction before running the rest of the OPAL script. The rest of the OPAL script performs actions on the extracted columns.

This OPAL script combines the UI actions and OPAL statements described above. It narrows down all events in the Raw Event dataset to just the weather observations and then shapes them into useful fields.

Worksheet with columns timestamp, id, city, description, temp, and country

Figure 7 - Worksheet with selected columns

You can display a graph of temperatures for the cities in your data by selecting the Visualize Visualize icon icon and then selecting temp from the list of available fields.

Worksheet with View As Visualization button highlighted. The right menu shows the **temp** field selected, with the resulting line graph of temperatures visible.

Figure 8 - Visualize the temperature data by city

Saving your shaping work as a new dataset

While the table of individual weather details may be useful, the data only exists in this Worksheet at the moment. To use this data in other ways, such as creating metrics, you need to move it to a dataset and publish it as a new event dataset.

  1. Return to your dataset by clicking the View as table Table icon icon.

  2. Click Publish New Event Streamin the right menu. You may have a cell selected in the table if you don’t see it. Click the X to return to the default right menu view.

  3. Name this dataset My Weather Tutorial/Weather Events.

  4. Click Publish.

Including a package name such as My Weather Tutorial creates a section My Weather Tutorial in the Explore tab. Use packages to group related datasets, making it easier to find them later. If My Weather Tutorial already exists, use another unique name.

Dataset My Weather Tutorial/Weather Events

Figure 9 - Dataset My Weather Tutorial/Weather Events

5. Save this Worksheet to continue with it later.

6. Click on the Worksheet name, Untitled Worksheet, at the top of the page.

7. Type a new name and click Save.

Hover to see the Pencil icon, indicating it may be edited.

Figure 10 - Change the Worksheet name

When you’re ready to continue, look for this Worksheet on the Explore tab under Worksheets.

Action menu displays Create New Resource

Figure 11 - Worksheet location on Worksheets page

Creating Resources for observation locations

Next, create a new resource set that contains a resource for each location. This also creates a Landing Page and a more convenient way to view the data by city.

To create Resources and publish them as a Resource Set, use the following steps:

  1. With your My Weather Tutorial/Weather Events worksheet open, locate the Actions menu at the bottom of the right menu.

  2. Select Create New Resource Set. If you don’t see this option, you may need to close the Cell Selected view first.

Action menu in the right menu, opened to show the Create New Resource Set menu item

Figure 12 - Create a new Resource dataset

3. Select all of the fields to include them in the Resource dataset.

4. Specify id as the Primary Key.

The Primary Key defines which field or fields uniquely identify each Resource. In this data, each city has a unique id. You use this later to link datasets.

5. Accept the default Resource Lifetime of 1 hour. This defines how long to wait between updates before this Resource becomes inactive.

Create New Resource Set open in the right menu with selected fields

Figure 13 - Select fields to add to Resource set

6. Click Create.

Create New Resource Set open in the right menu

Figure 14 - New Resource dataset

Creating a new Resource Set adds a second stage to the worksheet, with one row for each city (or id) in the data. Drag the time scrubber to see the weather for the past few hours:

Video: using the time scrubber

Stages do not contain datasets but do contain a temporary view of the data with whatever actions you have taken so far. They inherit the state of the parent stage, so your id resource stage builds on the extracted fields and other shaping work you did to create the Weather Events dataset. Multiple stages in a Worksheet show valuable data, but to link those results to other datasets, you need to finish creating this new Resource dataset.

You should have two stages in this Worksheet, your original one and a second called id. The Resource Set name automatically generates from the primary key, but you can change the Resource Set name to My Weather Tutorial/Locations.

  1. Hover over id to see the Pencil icon Pencil icon, and click to change it to “My Weather Tutorial/Locations”. Locations provides a more useful name than id, and adds it to the My Weather Tutorial package.

Note

Package and dataset names are case-sensitive.

2. Click Script to display the OPAL script used to create the resource set.

3. The make_resource statement in the console displays the expiry time in nanoseconds. To use hours instead, replace expiry:duration(3600000000000) with the following in the OPAL script:

// Change expiry to 1 hour
expiry:duration_hr(1)

4. Construct a more convenient label by adding this line before the primary_key(id) to the OPAL script:

// Create an easier to read label 
label:string_concat(city, ", ", country), 

5. Add the following line to the OPAL script to use this text for the observation label:

// Use this text for the label
set_label label

After you finish, your OPAL script should look like this:

make_resource options(expiry:duration_hr(1)),
  timestamp: timestamp,
  city: city,
  description: description,
  temp: temp,
  country: country,
  // Create an easier to read label 
  label:string_concat(city, ", ", country),
  primary_key(id)

// Use this text for the label 
set_label label

5. Click Run to run the OPAL script.

After running OPAL script to create a new column.

Figure 15 - Adding new column to resource set

6. Click Publish New Resource Set to save it.

Now you can go back to the Explore tab and open the My Weather Tutorial/Locations Resource Set. The default dashboard displays weather details, which may be filtered by city, country, or weather description.

Locations Landing Page, showing the default Locations Overview dashboard with cards for various types of weather information.

Figure 16 - Weather Dashboard

This shows basic information and simple visualizations. The following section describes creating temperature metrics for individual cities.

Creating weather metrics

A metric is a numeric measurement that changes over time, such as a temperature reading. They can be displayed in charts, used to trigger alerts, or calculate other values.

For this example, you create two temperature metrics:

  • Original Celsius reading

  • Calculated Fahrenheit equivalent

This builds on the Weather Events dataset you created previously, containing the raw values for each weather observation. Metrics operations use OPAL, so most of this section is done in the OPAL console.

To use the Weather Events data to create metrics, you need to shape the observations into a more appropriate form. See Introduction to Metrics for more details.

Summary of the Metrics process

  • Start with a Worksheet for the Weather Events dataset, which has one row for each set of measurements.

  • Using OPAL verbs and functions, shape these events into a series of new events, one for each value of your future metric. Add an interface to identify this dataset as containing metrics, which enables additional metric operations in Observe.

  1. To start, open the Weather Events dataset in a new worksheet.

  2. Click Console button Console at the bottom of the page to open the OPAL console.

  3. Use make_col to create a new field of type object. It contains the two metric values from each observation:

    • the original Celsius temperature reading

    • the value converted to Fahrenheit

   make_col metrics:make_object(
     "temperature_c":temp,
     "temperature_f":(temp*9/5 + 32)
     )

4. Use flatten_leaves to create two events, one for each temperature value.

flatten_leaves metrics

5. Use pick_col to select the required fields from the new events.

6. Use pick_col to select the columns, and rename the generated _c_ fields to something more useful.

pick_col valid_from:EventTime,
  id,
  city,
  country,
  metric:string(_c_metrics_path),
  value:float64(_c_metrics_value)

7. Define an interface to identify this dataset as containing metrics. This also specifies which columns contain the names and the values.

   interface "metric",
     metric:metric,
     value:value

8. Click Run to confirm everything works. When you finish, your OPAL should look like this:

make_col metrics:make_object(
  "temperature_c":temp,
  "temperature_f":(temp*9/5 + 32))

flatten_leaves metrics

pick_col valid_from:timestamp,
  id,
  city,
  country,
  metric:string(_c_metrics_path),
  value:float64(_c_metrics_value)
interface "metric",
  metric:metric,
  value:value

The resulting table contains two rows for each temperature reading:

Table containing metrics, one row for each value of the two temperature metrics

Figure 17 - Metrics with two rows for temperature

The last step is to link your primary key id to the corresponding Resource dataset you created earlier.

id column menu with Link To Resource Set - Locations selected

Figure 18 - Linking id to Locations Resource set

  1. Open the menu for the id column and select Link to Resource Set.

  2. Select Locations from the menu.

  3. From the Linked Resource Key Field Mapping list, select id.

  4. For the Link Name, enter My Weather Tutorial/Weather Metrics.

  5. Click Apply.

This adds a set_link line at the end of the existing OPAL script.

set_link "My Weather Tutorial/Weather Metrics", id: @ab12345678.id

Publish this Worksheet as a new Event Dataset using these steps:

  1. Verify that you have not selected any cells in the worksheet. If you have selected a cell, click the X in the right menu next to the green indicator. Publish New Event Stream appears on the menu.

  2. Click Publish New Event Stream.

  3. Enter My Weather Tutorial/Weather Metrics.

  4. Click Publish.

Weather Metrics dataset

Figure 19 - Weather Metrics dataset

Adding metrics to a dashboard

The default dashboard for Weather Metrics displays your temperature metrics but also a few unnecessary parameters. Create a custom dashboard to display just the items of interest.

  1. Click Worksheet to open a new Worksheet for Weather Metrics.

  2. Click the Visualize Visualize icon icon.

  3. From the menu on the right, under Input, click value to create a series over time graph.

  4. To create a different type of visualization, click the Visualize tab and select a different type from the Type list. Experiment with different types to see how the display changes with each type of graph.

  5. Click the Presentation tab to change the presentation layout. You can add legends, flip your x- and y-axes, change colors, and customize the appearance.

  6. To create a dashboard from the worksheet, click MoreMore icon.

  7. Click Export as Dashboard. This creates a dashboard you can manipulate using the Dashboard feature.

  8. Click the Pencil Pencil icon on the upper right to display the Dashboard panel.

For more information on using Dashboards, see Creating and using dashboards.

Metrics dashboard with design and definition options

Figure 20 - Metrics dashboard with Design and Definition options

This tutorial taught you how Observe ingests data from an OpenWeather app, an external data source. You also learned to filter and shape the ingested data into datasets and how datasets link together for business. You produced metrics related to the observations in the datasets and added them to a dashboard bringing together metrics and events into useful interactive pages.