Observe Performance Cookbook: Prefer Join to Lookup¶
Problem¶
Using lookup
in explorers or worksheets that use Resources or Intervals performs more slowly than expected.
Solution¶
Using the OPAL editor, change lookup
usage to join
and re-verify that the use cases are still working as desired.
Explanation¶
A join
is a relational inner join, while a lookup
is a relational left outer join. The difference is most prominent when dealing with Resource datasets as inputs. A lookup
will always return all the resources and their full time intervals of the left input (first Input dataset in OPAL), even for those resources and time intervals where no match is found in the right input. A join
instead performs an intersection of all the interval rows that make up the resources, and will only return the time intervals where both resources were live.
Be aware that intersected intervals can become very small, so queries may produce incomplete results, especially when multiple join
verbs are stacked up without resetting the valid-from and valid-to columns in between.
OPAL Join Verb Efficiency¶
Join verbs link multiple datasets together. They may generate more, fewer, or the same number of output rows as input rows, depending on the join condition and verb.
Joins are a powerful tool that can increase the amount of data collected and credits consumed. In order to keep your queries efficient, it is useful to check that the join being used is the most efficient one that will work for your use case.
From most expensive to least expensive:
lookup
orleftjoin
between two Resource or Interval datasetsThis uses an interval overlapping band-join and an additional temporal left outer join. Prefer join to lookup in this case.
join
between two Resource or Interval datasetsThis only uses an interval overlapping band-join
lookup
,leftjoin
, orjoin
between one Event and one Resource or Interval dataset (with or without window functions)join
orleftjoin
between two Event datasets with window functionsThese all use a point-in-interval band-join
lookup
,leftjoin
, orjoin
between two Event datasets (without window functions), or between one Table and another dataset.These use simple relational joins