processes

Table of Contents

Process promises refer to items in the system process table, i.e., a command in some state of execution (with a Process Control Block). Promiser objects are patterns that are unanchored, meaning that they match line fragments in the system process table.

      processes:

        "regex contained in process line"

            process_select = process_filter_body,
            restart_class = "activation class for process",
            ..;

Note: Process table formats differ between platforms. You can see how cfengine views the process table for your platform by inspecting cf_otherprocs, cf_procs, and cf_rootprocs which can be found in $(sys.workdir)/state/ (typically /var/cfengine/state).

For example, this is a sample of $(sys.workdir)/state/cf_rootprocs on a linux system:

USER       PID  PPID  PGID %CPU %MEM    VSZ  NI       RSS NLWP STIME     ELAPSED     TIME COMMAND
root         1     0     1  0.0  0.2  19232   0      1096    1 Sep14  1-21:41:52 00:00:00 /sbin/init
root         2     0     0  0.0  0.0      0   0         0    1 Sep14  1-21:41:52 00:00:00 [kthreadd]
root         3     2     0  0.0  0.0      0   -         0    1 Sep14  1-21:41:52 00:00:00 [migration/0]

This is an example showing how to restart splunk when a splunkd process owned by root is using 80% or more of the CPU.

bundle agent example
{
  processes:
      # Reference process table in $(sys.workdir)/state/cf_procs
      # Find lines in the process table starting with root (USER column)
      # followed by one or more spaces, followed by a digit (PID column),
      # followed by one or more spaces, followed by a digit (PGID column),
      # followed by one or more spaces, followed by 8 or 9 followed by a number
      # in the range 0-9 (to match numbers greater than 80), followed by a dot,
      # followed by anything, and containing splunkd (expected to match the
      # COMMAND column).

      "^root\s+\d+\s+\d+\s+\d+\s+[89][0-9]\..*splunkd"
        handle => "example_splunk_high_cpu_stop_gracefully",
        process_stop => "/opt/splunkforwarder/bin/splunk stop",
        comment => "Find splunkd processes owned by root that are consuming more
            than 80% of a CPU and restart it with it's preferred
                    utility. Stop it gracefully with the internal splunk binary.";

      "^root\s.*splunkd"
        restart_class => "splunk_not_running",
        comment => "Set splunk_not_running class if we cant find any root owned
            splunkd processes so that we can restart it using a
                    commands promise";

  commands:
    splunk_not_running::
      "/opt/splunkforwarder/bin/splunk"
        args => "--accept-license --answer-yes --no-prompt start";
}

Getting complex regular expressions just right can be difficult, so for most sophisticated matches, users should use a simple pattern match such as program names combinded with a process_select body before delving into complex regular expressions.

This example shows using process_select and process_count to define a class when a process has been running for longer than a day.

bundle agent main

{
  processes:

      "init"
        process_count   => any_count("booted_over_1_day_ago"),
        process_select  => days_older_than(1),
    comment => "Define a class indicating we found an init process running
                    for more than 1 day.";

  reports:

    booted_over_1_day_ago::

      "This system was booted over 1 days ago since there is an init process
       that is older than 1 day.";

    !booted_over_1_day_ago::
      "This system has been rebooted recently as the init process has been
       running for less than a day";
}



body process_count any_count(cl)
{
      match_range => "0,0";
      out_of_range_define => { "$(cl)" };
}


body process_select days_older_than(d)
{
      stime_range    => irange(ago(0,0,"$(d)",0,0,0),now);
      process_result => "!stime";
}

This policy can be found in /var/cfengine/share/doc/examples/processes_define_class_based_on_process_runtime.cf and downloaded directly from github.

Take care to not oversimplify your patterns as it may match unexpected processes. For example, on many systems, the process pattern "^cp" may not match any processes, even though "cp" is running. This is because the process table entry may list "/bin/cp". However, the process pattern "cp" will also match a process containing "scp", (the PCRE pattern anchors "\b" and "\B" may prove very useful to you).

To restart a process, you should set a class to activate and then describe a command in that class.

    commands:

      restart_me::

       "/path/executable" ... ;

This rationalizes complex restart-commands and avoids unnecessary overlap between processes and commands.

The process_stop is also arguably a command, but it should be an ephemeral command that does not lead to a persistent process. It is intended only for commands of the form /etc/inetd service stop, not for processes that persist. Processes are restarted at the end of a bundle's execution, but stop commands are executed immediately.

Commands and Processes

CFEngine distinguishes between processes and commands so that there is a clean separation between detection (promises about the process table) and certain repairs (promises to execute commands that start processes).

Command executions are about jobs, services, scripts etc. They are properties of an executable file, and the referring 'promiser' is a file object. On the other hand a process is a property of a "process identifier" which is a kernel instantiation, a quite different object altogether. For example:

  • A "PID" (which is not an executable) promises to be reminded of a signal, e.g.
    kill signal pid
  • An "command" promises to start or stop itself with a parameterized specification.
    exec command argument1 argument2 ...

Neither the file nor the pid necessarily promise to respond to these activations, but they are nonetheless physically meaningful phenomena or attributes associated with these objects.

  • Executable files do not listen for signals as they have no active state.
  • PIDs do not run themselves or stop themselves with new arguments, but they can use signals as they are running.

Executions lead to processes for the duration of their lifetime, so these two issues are related, although the promises themselves are not.

Platform notes

Process promises depend on the ps native tool, which by default truncates lines at 128 columns on HP-UX. It is recommended to edit the file /etc/default/ps and increase the DEFAULT_CMD_LINE_WIDTH setting to 1024 to guarantee that process promises will work smoothly on that platform.


Attributes

Common Attributes

Common attributes are available to all promise types. Full details for common attributes can be found in the Common Attributes section of the Promise Types and Attributes page. The common attributes are as follows:

action

classes

comment

depends_on

handle

ifvarclass

meta


process_count

Type: body process_count

Common Body Attributes

Common body attributes are available to all body types. Full details for common body attributes can be found in the Common Body Attributes section of the Promise Types and Attributes page. The common attributes are as follows:

inherit_from
meta

in_range_define

Description: List of classes to define if the matches are in range

Classes are defined if the processes that are found in the process table satisfy the promised process count, in other words if the promise about the number of processes matching the other criteria is kept.

Type: slist

Allowed input range: (arbitrary string)

Example:

     body process_count example
     {
     in_range_define => { "class1", "class2" };
     }

match_range

Description: Integer range for acceptable number of matches for this process

This is a numerical range for the number of occurrences of the process in the process table. As long as it falls within the specified limits, the promise is considered kept.

Type: irange[int,int]

Allowed input range: 0,99999999999

Example:

     body process_count example
     {
     match_range => irange("10","50");
     }

out_of_range_define

Description: List of classes to define if the matches are out of range

Classes to activate remedial promises conditional on this promise failure to be kept.

Type: slist

Allowed input range: (arbitrary string)

Example:

     body process_count example(s)
     {
     out_of_range_define => { "process_anomaly", "anomaly_$(s)"};
     }

process_select

Type: body process_select

Common Body Attributes

Common body attributes are available to all body types. Full details for common body attributes can be found in the Common Body Attributes section of the Promise Types and Attributes page. The common attributes are as follows:

inherit_from
meta

command

Description: Regular expression matching the command/cmd field of a process

This expression should match the entire COMMAND field of the process table, not just a fragment. This field is usually the last field on the line, so it thus starts with the first non-space character and ends with the end of line.

Type: string

Allowed input range: (arbitrary string)

Example:

     body process_select example

     {
     command => "cf-.*";

     process_result => "command";
     }

pid

Description: Range of integers matching the process id of a process

Type: irange[int,int]

Allowed input range: 0,99999999999

Example:

     body process_select example
     {
     pid => irange("1","10");
     process_result => "pid";
     }

pgid

Description: Range of integers matching the parent group id of a process

Type: irange[int,int]

Allowed input range: 0,99999999999

Example:

     body process_select example
     {
     pgid => irange("1","10");
     process_result => "pgid";
     }

ppid

Description: Range of integers matching the parent process id of a process

Type: irange[int,int]

Allowed input range: 0,99999999999

Example:

     body process_select example
     {
     ppid => irange("407","511");
     process_result => "ppid";
     }

priority

Description: Range of integers matching the priority field (PRI/NI) of a process

Type: irange[int,int]

Allowed input range: -20,+20

Example:

     body process_select example
     {
     priority => irange("-5","0");
     }

process_owner

Description: List of regexes matching the user of a process

The regular expressions should match a legal user name on the system. The regex is anchored, meaning it must match the entire name.

Type: slist

Allowed input range: (arbitrary string)

Example:

     body process_select example
     {
     process_owner => { "wwwrun", "nobody" };
     }

process_result

Description: Boolean class expression with the logical combination of process selection criteria

A logical combination of the process selection classifiers. The syntax is the same as that for class expressions. If process_result is not specified, then all set attributes in the process_select body are AND'ed together.

Type: string

Allowed input range: [(process_owner|pid|ppid||pgid|rsize|vsize|status|command|ttime|stime|tty|priority|threads)[|!.]*]*

Example:

     body process_select proc_finder(p)

     {
     process_owner  => { "avahi", "bin" };
     command        => "$(p)";
     pid            => irange("100","199");
     vsize          => irange("0","1000");
     process_result => "command.(process_owner|vsize).!pid";
     }

See also: file_result

rsize

Description: Range of integers matching the resident memory size of a process, in kilobytes

Type: irange[int,int]

Allowed input range: 0,99999999999

Example:

     body process_select
     {
     rsize => irange("4000","8000");
     }

status

Description: Regular expression matching the status field of a process

For instance, characters in the set NRSsl+... Windows processes do not have status fields.

Type: string

Allowed input range: (arbitrary string)

Example:

     body process_select example
     {
     status => "Z";
     }

stime_range

Description: Range of integers matching the start time of a process

The calculation of time from process table entries is sensitive to Daylight Savings Time (Summer/Winter Time) so calculations could be an hour off. This is for now a bug to be fixed.

Type: irange[int,int]

Allowed input range: 0,2147483647

Example:

     body process_select example
     {
     stime_range => irange(ago(0,0,0,1,0,0),now);
     }

ttime_range

Description: Range of integers matching the total elapsed time of a process.

This is total accumulated time for a process.

Type: irange[int,int]

Allowed input range: 0,2147483647

Example:

     body process_select example
     {
     ttime_range => irange(0,accumulated(0,1,0,0,0,0));
     }

tty

Description: Regular expression matching the tty field of a process

Windows processes are not regarded as attached to any terminal, so they all have tty '?'.

Type: string

Allowed input range: (arbitrary string)

Example:

     body process_select example
     {
     tty => "pts/[0-9]+";
     }

threads

Description: Range of integers matching the threads (NLWP) field of a process

Type: irange[int,int]

Allowed input range: 0,99999999999

Example:

     body process_select example
     {
     threads => irange(1,5);
     }

vsize

Description: Range of integers matching the virtual memory size of a process, in kilobytes.

On Windows, the virtual memory size is the amount of memory that cannot be shared with other processes. In Task Manager, this is called Commit Size (Windows 2008), or VM Size (Windows XP).

Type: irange[int,int]

Allowed input range: 0,99999999999

Example:

     body process_select example
     {
     vsize => irange("4000","9000");
     }

process_stop

Description: A command used to stop a running process

As an alternative to sending a termination or kill signal to a process, one may call a 'stop script' to perform a graceful shutdown.

Type: string

Allowed input range: "?(/.*)

Example:

    processes:

     "snmpd"

            process_stop => "/etc/init.d/snmp stop";

restart_class

Description: A class to be defined globally if the process is not running, so that a command: rule can be referred to restart the process

This is a signal to restart a process that should be running, if it is not running. Processes are signaled first and then restarted later, at the end of bundle execution, after all possible corrective actions have been made that could influence their execution.

Windows does not support having processes start themselves in the background, like Unix daemons usually do; as fork off a child process. Therefore, it may be useful to specify an action body that sets background to true in a commands promise that is invoked by the class set by restart_class. See the commands promise type for more information.

Type: string

Allowed input range: [a-zA-Z0-9_$(){}\[\].:]+

Example:

processes:

   "cf-serverd"

     restart_class => "start_cfserverd";

commands:

  start_cfserverd::

    "/var/cfengine/bin/cf-serverd";

signals

Description: A list of menu options representing signals to be sent to a process.

Signals are presented as an ordered list to the process. On Windows, only the kill signal is supported, which terminates the process.

Type: (option list)

Allowed input range:

       hup
       int
       trap
       kill
       pipe
       cont
       abrt
       stop
       quit
       term
       child
       usr1
       usr2
       bus
       segv

Example:

    processes:

     cfservd_out_of_control::

       "cfservd"

            signals         => { "stop" , "term" },
            restart_class   => "start_cfserv";

     any::

       "snmpd"

            signals         => { "term" , "kill" };