Query Command#

The query command allows you to run OPAL queries on data in your Observe workspace. You can specify one or more input datasets, and perform joins, unions, aggregation, and all the rest of the powerful OPAL operations. You do not need your query to be accelerable, because the query command directly returns the data much the same when using a worksheet or viewing a dashboard, and does not publish a new dataset.

observe query -q 'limit 4' -i 'Default.System' -x

query Command Options#

Short Command

Full Command

Description

-c

--csv

Output in CSV format

-e

--end-time string

End time of query window

-f

--file string

File containing OPAL query text

-f

--format string

Specify output format as table, extended, csv, ndjson

-i

--input strings

Input datasets using an ID or workspace.name

-j

--json

Express output in nd-JSON format

-l

--literal-strings

Print embedded control characters literally

-q

--query string

OPAL query text

-r

--relative duration

Duration of query window, anchored at either end

-s

--start-time string

The start time of query window

-w

--col-width int

Maximum column width for table format; 0 for unlimited (default 64)

-x

--extended

Print one column value per row rather than table format

Output File Format#

The output of the query command can be in one of four formats:

  1. Default table - This appears as typical SQL table output, with vertical bars and space padding between columns, and a header row. The maximum width of each column, by default, limited to allow the output to be more readable, and excess data is cut off with an ellipsis (…). You can turn this behavior off with the --col-width=0 option, or you can set the limit higher or lower than the default.

  2. CSV file - The comma-separated values file format resembles a classic output in data processing. The output consists of a number of columns, optionally quoted in double quotes, each separated with a comma character, and the row terminated with a newline. Double quotes inside each quoted value are quoted by repeating the double quote character, and newlines inside a quoted string do not terminate the row. The CSV data format has a single header row containing the names of the columns, followed by data rows. You specify the CSV output format with the --csv command line option.

  3. ND-JSON file - Each row outputs as a JSON object and terminates with a newline. Each column contains a named key in the object, so column names repeat for each row. There is no header row. You specify this format with the --json command line option.

  4. Extended table - This prints each field of each record on a separate line with a row of dashes between each record. This format can be helpful if the data contains long column values such as JSON objects. Specify this format with the --extended option.

The default table format quotes special characters such as newlines to avoid breaking the table formatting. If you want to print such characters literally, you can use the --literal-strings option. CSV and JSON data use well-defined specific formats and are not affected by this option, nor by column max length.

Extended format, specified by --extended-format resembles the table format, except each column has a row and label in the output.

Example Output

row 0             +-------------------------------------------------------------------
BUNDLE_TIMESTAMP  | 1683663807838826301
OBSERVATION_KIND  | system
FIELDS            | {"customer_id":123456789012,"dataset_id":00000000,"hour":"2023-…
EXTRA             | {"schema":"billing_storage_dataset","schema_version":"20221004"}
BUNDLE_ID         | 6989279
OBSERVATION_INDEX | 0
row 1             +-------------------------------------------------------------------
BUNDLE_TIMESTAMP  | 1683663807838826301
OBSERVATION_KIND  | system
FIELDS            | {"customer_id":123456789012,"dataset_id":11111111,"hour":"2023-…
EXTRA             | {"schema":"billing_storage_dataset","schema_version":"20221004"}
BUNDLE_ID         | 6989279
OBSERVATION_INDEX | 1
row 2             +-------------------------------------------------------------------
BUNDLE_TIMESTAMP  | 1683663807838826301
OBSERVATION_KIND  | system
FIELDS            | {"customer_id":123456789012,"dataset_id":22222222,"hour":"2023-…
EXTRA             | {"schema":"billing_storage_dataset","schema_version":"20221004"}
BUNDLE_ID         | 6989279
OBSERVATION_INDEX | 2

Query Text#

The query OPAL text can be copied from the command line using single quotes to avoid problems with shell interpolation, or from a file with the --file=filename command line option. The query has the same format as the OPAL console you see for a stage in a worksheet.

Query Inputs#

The Inputs section of the stage is provided with the --input=inputlist option, a comma-separated list of dataset IDs or paths, and input names. The first input prefaces the name with _ if not specified. Each additional input requires a name. For example, query input use the following format:

observe query -q 'leftjoin [email protected]_ip, source:@right.src_ip' 
-i '41021818,right=41012929'

You can also use the name of a workspace and dataset as a dataset path, as long as you do not use commas in the names. You can specify these with the following format:

observe query -q `leftjoin [email protected]_ip, source:@right.src_ip' -i 'Default.aws/Instance Events,right=network/Flow Logs'

Dataset names use the text form Workspace.Folder/Name. If you do not specify a Workspace in the input, the Dataset uses the default Workspace name in the configuration. You can set the default Workspace name per-profile to avoid typing it out, or specify it on the command line to easily run the same query in multiple different workspaces, assuming you enabled multiple workspaces for the instance you query.

Query Time Window#

Observe evaluates each query in a particular time window. By default, this time window, “the last hour”, truncates to full minutes. You can specify time in three different quantities, but you can specify at most two of these quantities for any one query:

  1. --start-time=date - specifies the beginning of the time window. This can be a string in the ISO/RFC3339 date format such as “YYYY-MM-DDTHH:MM:SSZ” or can be another format.

  2. --end-time=date - specifies the end of the time window. This can be a string in the ISO/RFC3339 date format such as “YYYY-MM-DDTHH:MM:SSZ” or it can be another format.

  3. --relative - specifies an interval in seconds, minutes, hours, or days. If you specify duration together with --start-time=date then the time window extends forward by this amount from that date. If you specify duration together with –end-time=date` then the time window extends backward by this amount from that date.

If you only specify one value, the default duration is one hour, and the default time anchor is “end-time is now, truncated to last minute”.

Now times read from the local machine clock. Times can be specified as absolute using ISO UTC time, or relative using ‘-1h’ or ‘now-1h’ format.

Time can also be snapped to a grid using ‘@5m’ format. Units supported for snapping include s(seconds), m (minutes), h(hours), d(days). Note that snapping always happens before applying the offset. now+7h@1d is the same as now@1d+7h. Because of ambiguities, the CLI does not allow specifying an absolute date, followed by an offset, followed by a snap is not allowed. if you specify an absolute date, specify the snap before the offset. A snap simply of a unit indicates a snap to one of those units, d means 1d.

Even if times are specified in local time, the date math (snapping to days) is performed in UTC. Times without explicit time zone will be interpreted in UTC. The time zone offset is positive because EST is behind UTC, and thus 10:00 in EST is 15:00 UTC, and times are calculated in UTC.

Example Times

2023-04-20T16:20:00Z (RFC 3339)
2023-04-20T16:20:00.123-08:00
-1d@1h (1 day ago from local time, snapped to start of hour)
now@10m-3h (three hours ago, snapped to start of 10-minute period)
1682007600@1d (epoch seconds: UNIX, snapped to start of UTC day)
1682007600000 (epoch milliseconds: Java and Javascript)
1682007600000000000 (epoch nanoseconds: Go, C++, OPAL)