Observe Performance Cookbook: Limit Valid Event Time Windows¶
Problem¶
Using a dataset with an offset time is taking a long time or using a lot of query credits.
Solution¶
Edit the dataset, find the set_valid_from
line, and change the max_time_diff
option to be shorter (or use a shorter frame). Also, move this operation as early as possible in the OPAL script. Verify that the use case still works, and save the dataset definition.
Explanation¶
The set_valid_from
verb is used to correct an event’s timestamp from the point when Observe received the event to a time described by a field in the event. For instance, a vulnerability audit record might have a timestamp of when the vulnerability scan was performed, which is more relevant to the user than when this row of data was sent to Observe.
The set_valid_from
command needs to review a larger amount of data if the expiry or frame is larger, so using the minimal time is best. Continuing with the vulnerability audit record example, if the expected delay between scan the object and deliver the report to Observe is 1 hour, then a max_time_diff
of 75 minutes is more than sufficient for most use cases.
Applying set_valid_from
later in a pipeline is also more expensive than doing it earlier; especially if the pipeline uses a join
verb or an aggregation
verb or function. As a result, it is always better to change the timestamp immediately when possible.