data_regextract

Table of Contents

Prototype: data_regextract(regex, string)

Return type: data

Description: Returns a data container filled with backreferences and named captures if the multiline anchored regex matches the string.

This function is significantly better than regextract() because it doesn't create classic CFEngine array variables and supports named captures.

If there are any back reference matches from the regular expression, then the data container will be populated with the values, in the manner:

    $(container[0]) = entire string
    $(container[1]) = back reference 1, etc

Note 0 and 1 are string keys in a map, not offsets.

If named captures are used, e.g. (?<name1>...) to capture three characters under name1, then that will be the key instead of the numeric position of the backreference.

PCRE named captures are described in http://pcre.org/pcre.txt and several syntaxes are supported:

     (?<name>...)    named capturing group (Perl)
     (?'name'...)    named capturing group (Perl)
     (?P<name>...)   named capturing group (Python)

Since the regular expression is run with /dotall/ and /multiline/ modes, to match the end of a line, use [^\n]* instead of $.

Arguments:

  • regex: regular expression - Regular expression - in the range: .*
  • string: string - Match string - in the range: .*

Example:

bundle agent main
{
  vars:
      # the returned data container is a key-value map:

      # the whole matched string is put in key "0"
      # the first three characters are put in key "name1"
      # the next three characters go into key "2" (the capture has no name)
      # the next two characters go into key "3" (the capture has no name)
      # then the dash is ignored
      # then three characters are put in key "name2"
      # then another dash is ignored
      # the next three characters go into key "5" (the capture has no name)
      # anything else is ignored

      "parsed" data => data_regextract("^(?<name1>...)(...)(..)-(?<name2>...)-(..).*", "abcdef12-345-67andsoon");
      "parsed_str" string => format("%S", parsed);

      # Illustrating multiline regular expression

      "instance_guid_until_end_of_string"
        data => data_regextract( "^guid\s?+=\s?+(?<value>.*)$",
                                 readfile( "/tmp/instance.cfg", 200));

      "instance_guid"
         data => data_regextract( "^guid\s+=\s+(?<value>[^\n]*)",
                                  readfile( "/tmp/instance.cfg", 200));

      "instance_port"
         data => data_regextract( "^port\s?+=\s?+(?<value>[^\n]*)",
                                  readfile( "/tmp/instance.cfg", 200));

  reports:
      "$(this.bundle): parsed[0] '$(parsed[0])' parses into: $(parsed_str)";
      "$(this.bundle): instance_guid_until_end_of_string[value] '$(instance_guid_until_end_of_string[value])'";
      "$(this.bundle): instance_guid[value] '$(instance_guid[value])'";
      "$(this.bundle): instance_port[value] '$(instance_port[value])'";
}

Output:

R: main: parsed[0] 'abcdef12-345-67andsoon' parses into: {"0":"abcdef12-345-67andsoon","2":"def","3":"12","5":"67","name1":"abc","name2":"345"}
R: main: instance_guid_until_end_of_string[value] '9CB197F0-4569-446A-A987-1DDEC1205F6B
port=5308'
R: main: instance_guid[value] '9CB197F0-4569-446A-A987-1DDEC1205F6B'
R: main: instance_port[value] '5308'

Notes:

History: Was introduced in version 3.7.0 (2015)

See also: regextract(), regex_replace(), pcre regular expression syntax summary