Observe Performance Cookbook: Making Resources from Multiple Datasets

Problem

Using update_resource to bring more source data into a Resource dataset uses more resources than desired.

Solution

Use the union verb to join the source datasets in a single worksheet before using make_resource to create the Resource dataset.

Explanation

It is common to build a Resource based on events coming from multiple event datastreams. It is much more efficient to union the event streams together and do make_resource once. This allows Observe to efficiently align time windows and create an optimal resource definition.

If you start by using make_resource on one event datastream and then use update_resource to merge in the other event streams, this will cause an expensive temporal left outer join.

Better

// I want to build a resource that contains information from three
// event streams. Here the input is dataset evt1.
union @evt2
union @evt3
make_resource options(expiry:1h),
  col1:col1, // from evt1
  col2:col2, // from evt2
  col3:col3, // from evt3
  primary_key(key)

Less Good

make_resource options(expiry:1h),
  col1:col1,
  primary_key(key)
update_resource options(expiry:1h),
  [email protected],
  col2:@evt2.col2
update_resource options(expiry:1h),
  [email protected],
  col3:@evt3.col3