parse_csv

Description

Parses an input string as character-separated values, where the default separator is a comma (comma-separated values.) Additionally, can un-quote backslash-quoted entities in the input string, which is a common extension to the base CSV format described in RFC4180.

The return value is an array of strings, or NULL if the input value could not be parsed.

The value argument is the string to parse. Values are separated by a particular separator character or short string, which defaults to ‘,’. Values that contain the separator, or contains double-quote characters, must be enclosed with double-quotes. Double-quotes within the value must be duplicated. This follows the conventions of RFC 4180. This value is any valid utf-8 string.

The optional unescape boolean argument allows you to specify that strings in double-quotes will also unescape characters like ‘\n’ to newline, ‘\r’ to carriage return, and \u00XX as a hex character (but high character plane hex literals (\uXXXX) are not supported.) By default, RFC 4180 CSV files do not unescape backslashes, so the default is for this feature to be off. The full set of supported characters are shown in the table below.

The optional separator string argument allows you to specify some separator character other than comma, such as space ‘ ‘, semicolon ‘;’, tab ‘\t’ or colon ‘:’, depending on what your input data requires. Emoji and other utf-8 as separators are supported. The default separator is a comma ‘,’.

Quoted Character

Meaning

\

backslash ()

double-quote (“)

b

backspace (\u0008)

f

formfeed (\u000C)

n

newline (\u000A)

r

carriage return (\u000D)

t

horizontal tab (\u0009)

u00XX

hexadecimal character value XX

Return type

generic array

Domain

This is a scalar function (calculates a single output value for a single input row.)

Categories

Usage

parse_csv(value, [ unescape ], [ separator ])

Argument

Type

Optional

Repeatable

Restrictions

value

string

no

no

none

unescape

bool

yes

no

constant

separator

string

yes

no

constant

Examples

make_col cols:parse_csv(substring(log, 24))

Parse the comma-separated data starting at offset 24 from the input line ‘log’, into an array of strings. The field separator is comma.

make_col c:parse_csv('3.14;"hello; world!";;true', ';')

Example output:

c

[“3.14”,”hello; world!’,””,”true”]