Observe Performance Cookbook: Cast Data Columns Extracted from JSON¶

Problem¶

You cannot use top K filtering on data with a column extracted from a JSON object.

Solution¶

Use the Column header to convert variant columns to specific types such as string or int64. In OPAL, examples of applicable functions are string, int64, and parse_timestamp.

Explanation¶

A field extracted from a JSON column has the type of any (corresponding to Snowflake’s VARIANT type), because a JSON object or array can contain values of any type.

An any value is significantly more expensive to process in query execution and transform execution. (Compare the performance of a statically typed language such as C++ to the performance of a dynamically typed language such as Python.)

Explicitly data-typing columns instead of leaving them as any improves performance and enables features that are disabled for columns with a type of any. If all of the data in an any column does not conform to the type, non-conforming values will be set to null.

Cast operations should be done as early as possible to avoid carrying any values through later operations.

Note that the OPAL compiler uses type inference to automatically insert these casts in some cases, but it is still good practice to do it manually.

Better

make_col cluster_uid:string(FIELDS.cluster_uid)

Less Good

make_col cluster_uid:FIELDS.cluster_uid