Observe Performance Cookbook: Limit Valid Event Time Windows

Problem

Using a dataset with an offset time is taking a long time or using a lot of query credits.

Solution

Edit the dataset, find the set_valid_from line, and change the max_time_diff option to be shorter (or use a shorter frame). Also, move this operation as early as possible in the OPAL script. Verify that the use case still works, and save the dataset definition.

Explanation

The set_valid_from verb is used to correct an event’s timestamp from the point when Observe received the event to a time described by a field in the event. For instance, a vulnerability audit record might have a timestamp of when the vulnerability scan was performed, which is more relevant to the user than when this row of data was sent to Observe.

The set_valid_from command needs to review a larger amount of data if the expiry or frame is larger, so using the minimal time is best. Continuing with the vulnerability audit record example, if the expected delay between scan the object and deliver the report to Observe is 1 hour, then a max_time_diff of 75 minutes is more than sufficient for most use cases.

Applying set_valid_from later in a pipeline is also more expensive than doing it earlier; especially if the pipeline uses a join verb or an aggregation verb or function. As a result, it is always better to change the timestamp immediately when possible.