Observe Performance Cookbook: Avoid Large JSON Blobs

Problem

A dataset with one or more large JSON objects takes a long time to use and consumes more credits than desired.

Solution

Extract specific fields from the JSON, or flatten it. Once you have modeled the data that you want and verified your use case, you can safely drop the JSON column.

Explanation

Large JSON arrays or objects can be used in Observe, but their size means they typically include irrelevant data. Additionally, searching or filtering for important fields is more efficient if those fields are separated from the larger context of the entire array. Extracting desired fields using make_col improves performance.

Full-text search on JSON columns (using contains or regex functions) is very expensive, because it requires casting the JSON column to string. If casting from JSON to string is an important part of queries, then it’s better to do it at transform time where a more efficient process can be used. The best way is to extract string columns which contain the fields to search, and then searching with those columns.