What is the best practice for field naming in OPAL?¶

Keep naming as close as possible to source¶

When extracting fields, it is often tempting to make column names friendlier by normalizing format (snake_case, camelCase) or making the name less verbose or clearer. This is almost always a mistake.

Normalization is a futile effort, because the nature of our product is that you will inevitably be exposed to datasets that are outside of your control, at which point you will have to account for the fact someone else may not have the same guidelines for normalizing column names that you do.

Modifying the column name (e.g. from firstSeen to createdOn) also tends to be a bad idea, because when users try to lookup what the field means the search results will come up empty. Keeping column names close to the source ensures that people can find the appropriate documentation (e.g. AWS reference docs).

Bad OPAL:

make_col
  sha:string(data.CodeSha256),
  customer_id:string(data.customerID)

Good OPAL:

make_col
  codeSha256:string(data.CodeSha256),
  customerID:string(data.customerID)