Observe Performance Cookbook: Making Resources from Multiple Datasets¶
Problem¶
Using update_resource
to bring more source data into a Resource dataset uses more resources than desired.
Solution¶
Use the union
verb to join the source datasets in a single worksheet before using make_resource
to create the Resource dataset.
Explanation¶
It is common to build a Resource based on events coming from multiple event datastreams. It is much more efficient to union
the event streams together and do make_resource
once. This allows Observe to efficiently align time windows and create an optimal resource definition.
If you start by using make_resource
on one event datastream and then use update_resource
to merge in the other event streams, this will cause an expensive temporal left outer join.
Better
// I want to build a resource that contains information from three
// event streams. Here the input is dataset evt1.
union @evt2
union @evt3
make_resource options(expiry:1h),
col1:col1, // from evt1
col2:col2, // from evt2
col3:col3, // from evt3
primary_key(key)
Less Good
make_resource options(expiry:1h),
col1:col1,
primary_key(key)
update_resource options(expiry:1h),
[email protected],
col2:@evt2.col2
update_resource options(expiry:1h),
[email protected],
col3:@evt3.col3