Observe Performance Cookbook: Prefer Join to Lookup¶
Problem¶
Using lookup
in explorers or worksheets that use Resources or Intervals performs more slowly than expected.
Solution¶
Using the OPAL editor, change lookup
usage to join
and re-verify that the use cases are still working as desired.
Explanation¶
A join
is a relational inner join, while a lookup
is a relational left outer join. The difference is most prominent when dealing with Resource datasets as inputs. A lookup
will always return all the resources and their full time intervals of the left input (first Input dataset in OPAL), even for those resources and time intervals where no match is found in the right input. A join
instead performs an intersection of all the interval rows that make up the resources, and will only return the time intervals where both resources were live.
Be aware that intersected intervals can become very small, so queries may produce incomplete results, especially when multiple join
verbs are stacked up without resetting the valid-from and valid-to columns in between.