Observe Performance Cookbook: Reduce Columns Earlier in OPAL Scripts

Problem

A query using filtering in OPAL is taking a long time or using a lot of query credits.

Solution

Use pick_col or drop_col to drop unnecessary columns before publishing a dataset. Move these verbs as early as possible in your OPAL script.

Explanation

These verbs reduce the data volume and make downstream operations faster. Unnecessary columns still get transformed and inserted into accelerated datasets where they incur transform cost and storage cost. They also make queries slower, as the data in those columns are still fetched, even if those columns are hidden.

In most cases, you can prefer pick_col over drop_col, because pick_col reduces the chance of overlooking a hidden column, and also avoids re-materializing the dataset if a column is later added to an upstream or input dataset.

Conversely, use drop_col if you want new input columns to be added to the dataset automatically and automatic re-materialization is acceptable.