data_regextract
Prototype: data_regextract(regex, string)
Return type: data
Description: Returns a data container filled with backreferences
and named captures if the multiline anchored regex
matches the
string
.
This function is significantly better than regextract()
because it
doesn't create classic CFEngine array variables and supports named
captures.
If there are any back reference matches from the regular expression, then the data container will be populated with the values, in the manner:
$(container[0]) = entire string
$(container[1]) = back reference 1, etc
Note 0
and 1
are string keys in a map, not offsets.
If named captures are used, e.g. (?<name1>...)
to capture three
characters under name1
, then that will be the key instead of the
numeric position of the backreference.
PCRE named captures are described in http://pcre.org/pcre.txt and several syntaxes are supported:
(?<name>...) named capturing group (Perl)
(?'name'...) named capturing group (Perl)
(?P<name>...) named capturing group (Python)
Since the regular expression is run with /dotall/ and /multiline/ modes, to match the end of a line, use [^\n]*
instead of $
.
Arguments:
regex
: regular expression - Regular expression - in the range:.*
string
:string
- Match string - in the range:.*
Example:
bundle agent main
{
vars:
# the returned data container is a key-value map:
# the whole matched string is put in key "0"
# the first three characters are put in key "name1"
# the next three characters go into key "2" (the capture has no name)
# the next two characters go into key "3" (the capture has no name)
# then the dash is ignored
# then three characters are put in key "name2"
# then another dash is ignored
# the next three characters go into key "5" (the capture has no name)
# anything else is ignored
"parsed" data => data_regextract("^(?<name1>...)(...)(..)-(?<name2>...)-(..).*", "abcdef12-345-67andsoon");
"parsed_str" string => format("%S", parsed);
# Illustrating multiline regular expression
"instance_guid_until_end_of_string"
data => data_regextract( "^guid\s?+=\s?+(?<value>.*)$",
readfile( "/tmp/instance.cfg", 200));
"instance_guid"
data => data_regextract( "^guid\s+=\s+(?<value>[^\n]*)",
readfile( "/tmp/instance.cfg", 200));
"instance_port"
data => data_regextract( "^port\s?+=\s?+(?<value>[^\n]*)",
readfile( "/tmp/instance.cfg", 200));
reports:
"$(this.bundle): parsed[0] '$(parsed[0])' parses into: $(parsed_str)";
"$(this.bundle): instance_guid_until_end_of_string[value] '$(instance_guid_until_end_of_string[value])'";
"$(this.bundle): instance_guid[value] '$(instance_guid[value])'";
"$(this.bundle): instance_port[value] '$(instance_port[value])'";
}
Output:
R: main: parsed[0] 'abcdef12-345-67andsoon' parses into: {"0":"abcdef12-345-67andsoon","2":"def","3":"12","5":"67","name1":"abc","name2":"345"}
R: main: instance_guid_until_end_of_string[value] '9CB197F0-4569-446A-A987-1DDEC1205F6B
port=5308'
R: main: instance_guid[value] '9CB197F0-4569-446A-A987-1DDEC1205F6B'
R: main: instance_port[value] '5308'
Notes:
History: Was introduced in version 3.7.0 (2015)
See also: regextract()
, regex_replace()
, pcre regular expression syntax summary