Tutorial: modeling weather data

This tutorial shapes weather data using a combination of UI actions and OPAL statements. Many people like to mix and match, performing some actions in the UI and some with OPAL. Also, not every OPAL operation has an equivalent in the UI. So it’s useful to be comfortable with both.

The data comes from OpenWeather, via a process that periodically requests current weather observations for a selection of cities. This poller is not currently user-configurable. If you would like to follow along with live data, contact us about configuring this data input in your workspace.

The OpenWeather data

Each weather observation contains the current conditions for a single location, with common measurements such as temperature, humidity, wind speed, and more. You can find an example of a JSON observation in the OpenWeather documentation.

For this tutorial, you create new datasets for the incoming data, extract fields and create metrics, and display data on a dashboard.

Here is what you’ll cover:

Identify the data of interest

When you ingest data, it initially goes into the Observation event dataset. This dataset contains everything ingested into your workspace, so that could be quite a lot. (If you are using data streams, your data may be in a data stream dataset instead. See Data streams for details.)

The first step for a new input is to identify the data of interest and create a dataset for it. Do this with a new Worksheet.

Open a Worksheet with the Observation event dataset:

Worksheet button on the Observation dataset card
  1. From the Workplace Settings tab, click Data streams and search for OpenWeather.

  2. Click on Raw events.

  3. Click Open dataset.

  4. Click Worksheet.

To view just the weather data, filter by something unique to this source. In this case, the weather data has an api.openweather.org field in the EXTRA column:

5. Open the menu for the EXTRA column, either with the down arrow or by right-clicking the column header.

6. Select Filter from the menu.

7. Search for host and check the box to select it. (If you don’t find it, try selecting a longer time range using the time picker.)

Also, you can filter on the value api.openweathermap.org. These two things uniquely identify the weather data:

8. Select Value from the dropdown to show the available values.

9. Check api.openweathermap.org to select and click Apply.

Now the event table shows only the weather data.

Video instructions

Note

You performed these actions using the UI, but doing so also generated an equivalent OPAL statement in the Console button Console at the bottom of the page:

filter (not is_null(EXTRA.host)) and (string(EXTRA.api.openweathermap.org) = "weather")

For more about the OPAL query language, see OPAL — Observe Processing and Analysis Language or continue to the next sections of this tutorial.

Extract fields from the JSON payload

With the table filtered to just the weather data, extract some fields of interest from FIELDS:

Extract from JSON in the right menu, showing nested fields selected with checkboxes
  • Open the menu for the data column and choose Extract from JSON

  • In the right menu, select the following fields. Some are inside nested objects:

    • dt: time the observation was recorded

    • id: unique ID

    • name: location, in this case, the city name

    • main.temp: current temperature, in Celsius

    • sys.country: country the city is located in

    • weather[0].description: text description of the current conditions

Click Apply to save.

Add the following OPAL statement in the console:

// Extract from the JSON payload in FIELDS
make_col dt:int64(FIELDS.dt),
  id:int64(FIELDS.id),
  temp:float64(FIELDS.main.temp),
  country:string(FIELDS.sys.country),
  description:string(FIELDS.weather[0].description)

Click Run to apply.

Tidy as you go: set the desired timestamp and remove unneeded fields

You have the right fields, but there are a few additional things to adjust before you are ready to continue. These actions are done with OPAL, so if the console isn’t already open in your worksheet, click on Console button Console to open.

The time the observation was ingested is in BUNDLE_TIMESTAMP, but that isn’t actually the time it was recorded by the sensors. That timestamp is in dt, as epoch time in seconds. To use this as the timestamp, convert it from type integer to type timestamp with from_seconds(). While you are at it, you can also give some fields more useful names.

To do this, add the following OPAL statement in the console:

// Create a new field of type timestamp, converting the epoch time in dt
make_col timestamp:from_seconds(int64(dt))

Next, you need to tell Observe this is our new timestamp field. Do this with the set_valid_from verb, which marks it as the “Valid From”, or the time this observation’s data first became valid:

// Use the field "timestamp" as your timestamp, by setting as the valid from time
// For better performance, don't accept a new value too far off from the
// the original ingest time you are replacing. See set_valid_from docs for details.
set_valid_from options(max_time_diff:duration_min(5)), timestamp

While you are here, you can also call the field name something more descriptive:

// rename the existing name field to city
rename_col city:name

There are also several fields you don’t need anymore. For best performance, it’s good practice to drop fields you don’t need. Remove them with one of these methods:

Select Delete Column from the column menu of the field to delete.

EXTRA Column header menu open, showing the Delete columns menu item

There are two ways to remove fields in OPAL: drop the ones you don’t want with drop_col, or choose the ones you do want with pick_col.

With drop_col:

// Remove these fields from the data
drop_col BUNDLE_TIMESTAMP, OBSERVATION_KIND, FIELDS, EXTRA, BUNDLE_ID, OBSERVATION_INDEX, dt

With pick_col:

// Keep these fields and drop all others
pick_col timestamp, id, city, description, temp, country

Note

Observations must still have a valid timestamp, so don’t drop your timestamp field. The UI won’t show the Delete column menu item for that field, and the OPAL console displays an error if you try to do it with col_drop or col_pick.

When you are done, your OPAL script should look something like this:

// Filter to just the weather data
// Use not is_null() to ensure you only get rows where poller_type exists,
// and string() so you can compare its value to the string "weather"
filter (not is_null(EXTRA.poller_type)) and (string(EXTRA.poller_type) = "weather")

// Extract desired fields from the JSON payload, contained in FIELDS
// Also rename name to city
make_col dt:int64(FIELDS.dt),
  id:int64(FIELDS.id),
  city:string(FIELDS.name),
  description:string(FIELDS.weather[0].description),
  temp:float64(FIELDS.main.temp),
  country:string(FIELDS.sys.country)

// Use the field "timestamp" as your timestamp, by setting it as the valid from time
// For better performance, don't accept a new value too far off from the
// the original ingest time you are replacing. See set_valid_from docs for details.
make_col timestamp:from_seconds(int64(dt))
set_valid_from options(max_time_diff:duration_min(5)), timestamp


// Select only the fields you want, dropping others

// Note: pick_col must include a valid timestamp
pick_col timestamp, id, city, description, temp, country

This OPAL script combines the UI actions and OPAL statements described above. It narrows all the events in the Observation Event Stream to just weather observations, and then shapes them into useful fields.

Worksheet with columns timestamp, id, city, description, temp, and country

A future tutorial may cover additional options for filtering and visualizing data in a Worksheet. Below is an example, a graph of temperatures for the cities in your data (click to enlarge.)

Worksheet with View As Visualization button highlighted. The right menu shows the "temp" field selected, with the resulting line graph of temperatures visible.

Save your shaping work as a new dataset

A table of individual weather details is useful, but at the moment it only exists in this Worksheet. To use this data in other ways, such as creating metrics, it needs to be its own dataset. To do that, publish it as a new event stream:

  1. In the right menu, click Publish New Event Stream. (If you don’t see it, you may have a cell selected in the table. Click the X to return to the default right rail view.)

  2. Name this dataset “My Weather Test/Weather Events”.

  3. Click Publish.

Worksheet right rail, with Publish New Event Stream dialog open

Including a package name like “My Weather Test” creates a section My Weather Test in the Explore tab. Use packages to group related datasets together, making it easier to find them later. (If someone else already has a My Weather Test package, use another unique name.)

If you would like to pause here, save this Worksheet to continue with it later. You may also prefer to change the name to something more meaningful:

  • Click on the Worksheet name, Observation, at the top of the page. (The second “Observation” is the name of the stage. More about stages later.)

  • Type a new name and click Save.

Worksheet name. Hover to see the pencil icon, indicating it may be edited.

When you’re ready to continue, look for this Worksheet in the Explore tab under Worksheets.

Create Resources for observation locations

The next step is to create a new resource set, containing a resource for each location. This gives us a Landing Page, and a more convenient way to view the data by city.

To create Resources and publish them as a Resource Set:

  • In your Worksheet, open the Actions menu at the bottom of the right menu.

  • Select Create New Resource Set. (If you don’t see this option, you may need to close the Cell Selected view first.)

    Action menu in the right menu, opened to show the Create New Resource Set menu item
  • Select the fields to include, which in this case is all of them.

  • Specify id as the Primary Key.

    The Primary Key defines which field (or fields) uniquely identify each Resource. In this data, each location (city) has a unique id. You use this later to link datasets.

  • Accept the default Resource Lifetime of 1 hour. This defines how long to wait between updates before this Resource is considered inactive.

  • Click Create.

    Create New Resource Set open in the right menu

Creating a new Resource Set adds a second stage to the worksheet, with one row for each city (or id) in the data. Drag the time scrubber to see the weather for the past few hours:

Video: using the time scrubber

Stages are not datasets themselves, but a temporary view of the data with whatever actions have been taken so far. They inherit the state of the parent stage, so your id resource stage builds on the extracted fields and other shaping work you did to create the Weather Events dataset. Multiple stages in a Worksheet show useful data, but in order to link those results to other datasets, you need to finish creating this new Resource dataset.

Before you do, there are a few things that make this dataset easier to use.

  • Change the Resource Set name

    You should have two stages in this Worksheet: your original one, and a second called “id”. This name is automatically generated from the primary key, but that might not be very meaningful. In the right menu, hover to see the Pencil icon pencil icon, and click to change it to “My Weather Test/Locations”. Not only is “Locations” a more useful name than “id”, but this also adds it to the My Weather Test package. (Note that package and dataset names are case-sensitive.)

  • Change nanosecond time period to hours

Your make_resource statement in the console shows the expiry time in nanoseconds. To use hours instead, replace expiry:duration(3600000000000) with expiry:duration_hr(1). While this doesn’t change the expiry time, it makes the OPAL easier to read.

  • Construct a more convenient label

    Add the line label:strcat(name, ", ", country), before primary_key(id) to construct the label text. Then add set_label label to use this text for the observation’s label.

After you finish, your OPAL script should look like this:

make_resource options(expiry:duration_hr(1)),
  timestamp: timestamp,
  city: city,
  description: description,
  temp: temp,
  country: country, 
  label:strcat(city, ", ", country),
  primary_key(id)

set_label label

Click Run to make sure everything works. When it looks good, click Publish New Resource Set to save.

Now you can go back to the Explore tab and open the Landing Page for the My Weather Test/Locations Resource Set. The default board displays weather details, which may be filtered by city, country, or weather description.

Locations Landing Page, showing the default Locations Overview dasboard with cards for various types of weather information.

This shows basic information and simple visualizations, but what you don’t have yet are temperature metrics for the individual cities.

Create weather metrics

A metric is a numeric measurement that changes over time, such as a temperature reading. They can be displayed in charts, used to trigger alerts, or calculate other values.

For this example, you create two temperature metrics: one for the original Celsius reading, and one for the calculated Fahrenheit equivalent. This builds on the Weather Events dataset you created previously, containing the raw values for each weather observation. Metrics operations use OPAL, so most of this section is done in the OPAL console.

To use the Weather Events data as metrics, you need to shape the observations into a more appropriate form. (Metrics, including how and why to shape data in a particular way, are covered in more detail in Introduction to Metrics.)

Here is a summary of the process:

  • Start with a Worksheet for the Weather Events dataset, which has one row for each set of measurements. These are “wide metrics:” easy to read, but not as easy to perform calculations with.

  • Using OPAL verbs and functions, shape these events into a series of new events: one for each value of your future metric. These are “narrow” metrics.

  • Add an interface to identify this dataset as containing metrics, which enables additional metric operations in Observe.

To get started, open the Weather Events dataset in a new worksheet. If needed, click Console button Console at the bottom of the page to open the OPAL console. Then start shaping the data:

  1. Use make_col to create a new field of type object. It contains the two metric values from each observation: the original Celsius temperature reading, and the same value converted to Fahrenheit.

    make_col metrics:make_object(
      "temperature_c":temp,
      "temperature_f":(temp*9/5 + 32)
      )
    
  2. Use flatten_leaves to create two events, one for each temperature value.

    flatten_leaves metrics
    
  3. Use pick_col to select the needed fields from the new events.

    While it’s true you can delete the one unneeded field using drop_col, by using pick_col you can also rename the generated _c_ fields to something more useful.

    pick_col valid_from:timestamp,
      id,
      city,
      country,
      metric:string(_c_metrics_path),
      value:float64(_c_metrics_value)
    
  4. Define an interface to identify this dataset as containing metrics. This also specifies which columns contain the names and the values.

    interface "metric",
      metric:metric,
      value:value
    

    Note

    Observe recommends naming metric name columns metric and metric value columns value. If your dataset contains these columns, you may omit them in your interface "metric" statement.

    // Rename existing columns to "metric" and "value"
    rename_col metric:string(_c_metrics_path), value:(_c_metrics_value)
    
    // Automatically discover metrics that follow naming convention:
    //   "metric" contains names of individual metrics
    //   "value" contains the values
    interface "metric"
    

Click Run to confirm everything is working. When you are done, your OPAL should look something like this:

make_col metrics:make_object(
  "temperature_c":temp,
  "temperature_f":(temp*9/5 + 32)
  )
flatten_leaves metrics
pick_col valid_from:timestamp,
  id,
  city,
  country,
  metric:string(_c_metrics_path),
  value:float64(_c_metrics_value)
interface "metric",
  metric:metric,
  value:value

The resulting table contains two rows for each temperature reading:

Table containing metrics, one row for each individual value of the two temperature metrics

The last step is to link your primary key id to the corresponding Resource dataset you created earlier.

id column menu with Link To Resource Set - Locations selected
  • Open the menu for the id column and select Link to Resource Set.

  • Select “Locations” from the sub-menu.

  • In the right menu, change the auto-generated Link Name to “Locations”.

  • Click Apply.

This adds a set_link line at the end of the existing OPAL script. Publish this Worksheet as a new Event Stream named “My Weather Test/Weather Metrics”.

View your new metrics

Now you have all the pieces in place to display your metrics. Go to the Locations Landing Page and there’s a new section showing Temp C and Temp F.

Locations landing page, showing data for Seattle and Chicago.

Add metrics to a dashboard

The default dashboard shows your temperature metrics, but also a few unnecessary parameters. Create a custom dashboard to show just the items of interest:

Boards dialog open, with the Board button
  • Click three dots button More icon to open the Export as Dashboard dialog.

  • Click Create Board.

Drag cards from the right menu to add items to your board. You can create collapsible sections, add images, and more. When you are happy with the results, click the three dots button More icon to give it a name, and then save button to save.

Locations landing page with the Rename Board dialog open