extract_regex¶

Type of operation: Projection, Semistructured

Description¶

Add one or more columns by matching capture names in a regular expression against a given source expression.

Regex extractions create string columns. Named capture groups are an extension to POSIX extended regular expressions.

If the column already exists, and it is of type string, the original value is replaced with the matched text. If the regular expression does not match anything, the original value is preserved.

If the column already exists, but it is not a string column, extract_regex returns an error.

Usage¶

extract_regex path, regex, [ flags ]

Argument	Type	Optional	Repeatable	Restrictions
path	expression	no	no	none
regex	regex	no	no	none
flags	string	yes	no	none

Accelerable¶

extract_regex is always accelerable if the input is accelerable. A dataset that only uses accelerable verbs can be accelerated, making queries on the dataset respond faster.

Examples¶

extract_regex message, /status=(?P<statuscode>\d+)/

Create the column ‘statuscode’ by matching for status=numbers in the field ‘message’.

extract_regex inputcol, /(?P<sensor>[^|]*)\|count:(?P<counts>[^|]*)\|env:(?P<env>[^|]*)/

Given an input column value like: “studio-aqi|count:654 201 28 0 0 0|env:3 4 4a”, generate three output columns: “sensor” with the value “studio-aqi”, “counts” with the value “654 201 0 0 0”, and “env” with the value “3 4 4a”.

extract_regex message, /(?P<date::parse_isotime>[0-9:TZ.]+) (?P<name>[a-z]+)=(?P<value::float64>[0-9.]+)/

Given an input column called message, generate three output columns called date, name and value with the corresponding regex matching. The “date” column is typecasted to datatype timestamp using parse_isotime, the “name” column remains the default type string and the “value” column is typecasted to datatype float64 using float64.

Aliases¶

colregex (deprecated)