extract_regex

Type of operation: Projection, Semistructured

Description

Add one or more columns by matching capture names in a regular expression against a given source expression. Regex extractions create string columns. Named capture groups are an extension to POSIX extended regular expressions. If the column already exists, and the regular expression finds nothing, the previous value is preserved. See also: make_col.

The flags argument specifies optional regex flags:

  • c - Enables case-sensitive matching (default.)

  • i - Enables case-insensitive matching.

  • m - Enables multi-line mode (i.e. meta-characters ^ and $ match the beginning and end of any line of the input string.) By default, multi-line mode is disabled (i.e. ^ and $ match the beginning and end of the entire input string.)

  • s - Enables the POSIX wildcard character . to match \n (newline.) By default, . does not match \n.

For more about syntax, see POSIX extended regular expressions.

Usage

extract_regex path, regex [ , flags ]

Argument

Type

Required

Multiple

path

expression

Required

Only one

regex

regex

Required

Only one

flags

string

Optional

Only one

Accelerable

extract_regex is always accelerable if the input is accelerable. A dataset that only uses accelerable verbs can be accelerated, making queries on the dataset respond faster.

Examples

extract_regex message, /status=(?P<statuscode>\d+)/

Create the column ‘statuscode’ by matching for status=numbers in the field ‘message’.

extract_regex inputcol, /(?P<sensor>[^|]*)\|count:(?P<counts>[^|]*)\|env:(?P<env>[^|]*)/

Given an input column value like: “studio-aqi|count:654 201 28 0 0 0|env:3 4 4a”, generate three output columns: “sensor” with the value “studio-aqi”, “counts” with the value “654 201 0 0 0”, and “env” with the value “3 4 4a”.

Aliases

  • colregex (deprecated)