pick_col
pick_col [columnbinding: expression]+
Projects the pipeline to an explicit list of columns in argument order, binding each output name to an expression and dropping every input column that is not selected.
Datasets intended as parents of child datasets often use pick_col to pin a stable public schema, preserve explicit output column order, and prevent new upstream columns from automatically appearing in children.
Time columns and resource keys
If the input defines valid-from and/or valid-to columns, the output must still designate those roles—by picking or renaming the existing time columns, or by supplying a compatible row_timestamp binding when that applies. On resource-shaped inputs, every primary-key column must appear (as a direct column reference or an equivalent binding).
Names and metadata
Duplicate output column names in a single pick_col are rejected. A binding that only renames or reorders an existing column preserves that column’s field metadata when the declared type matches the input column type; richer expressions rebuild metadata from the expression. If required fields for a declared interface are omitted, that interface is dropped from the output metadata.
Use make_col to add columns, drop_col to remove them without full projection, and rename_col when only names change.
Categories
Accelerable
pick_col is always accelerable if the input is accelerable. A dataset that only uses accelerable verbs can be accelerated, making
queries on the dataset respond faster.
Examples
Keeps only the two listed columns in argument order, dropping every other field from the shared months table.
Renames and casts columns while projecting away the rest of the schema so downstream steps see short, explicit names.
Reorders the same two columns by listing month_name first, showing that output column order follows the argument list.