Observe Performance Cookbook: Prefer Lead and Lag to First and Last

Problem

I want to efficiently propagate non-null values forward or backward in time to ensure that they are usable throughout a transaction.

Solution

Using the OPAL editor, find first_not_null and last_not_null usage. Replace with lead_not_null and lag_not_null window functions and re-verify that the use cases are still working as desired.

Explanation

The two methods have similar semantics, but lead_not_null and lag_not_null are much more efficient than first_not_null and last_not_null. This is because Observe can share all the lead/lag calculation for all frames, while have to compute the first/last for each frame individually.

Better

make_col container_id:window(lag_not_null(container_id), frame(back:10m), group_by(key))

Less Good

make_col container_id:window(last_not_null(container_id), frame(back:10m), group_by(key))