Observe Performance Cookbook: Reduce Columns Earlier in OPAL Scripts¶
Problem¶
A query using filtering in OPAL is taking a long time or using a lot of query credits.
Solution¶
Use pick_col
or drop_col
to drop unnecessary columns before publishing a dataset. Move these verbs as early as possible in your OPAL script.
Explanation¶
These verbs reduce the data volume and make downstream operations faster. Unnecessary columns still get transformed and inserted into accelerated datasets where they incur transform cost and storage cost. They also make queries slower, as the data in those columns are still fetched, even if those columns are hidden.
In most cases, you can prefer pick_col
over drop_col
, because pick_col
reduces the chance of overlooking a hidden column, and also avoids re-materializing the dataset if a column is later added to an upstream or input dataset.
Conversely, use drop_col
if you want new input columns to be added to the dataset automatically and automatic re-materialization is acceptable.