The Complete CFEngine Enterprise

Table of Content

CFEngine Enterprise is an IT automation platform that uses a model-based approach to manage your infrastructure, and applications at WebScale while providing best-in-class scalability, security, enterprise-wide visibility and control.

WebScale IT Automation

CFEngine Enterprise provides a secure and stable platform for building and managing both physical and virtual infrastructure. Its distributed architecture, minimal dependencies, and lightweight autonomous agents enable you to manage 5,000 nodes from a single policy server.

WebScale does not just imply large server deployments. The speed at which changes are conceived and committed across infrastructure and applications is equally important. Due to execution times measurable in seconds, and one of the most efficient verification mechanisms, CFEngine reduces exposure to unwarranted changes, and prevents extreme delays for planned changes that need to be applied urgently at scale.

Intelligent Automation of Infrastructure

Automate your infrastructure with self-service capabilities. CFEngine Enterprise enables you to take advantage of agile, secure, and scalable infrastructure automation that makes repairs using a policy-based approach.

Policy-Based Application Deployment

Achieve repeatable, error-free and automated deployment of middleware and application components to datacenter or cloud-based infrastructure. Along with infrastructure, automated application deployment provides a standardized platform.

Self-Healing Continuous Operations

Gain visibility into your infrastructure and applications, and be alerted to issues immediately. CFEngine Enterprise contains built-in inventory and reporting modules that automate troubleshooting and compliance checks, as well as remediate in a self-healing fashion.

CFEngine Enterprise Features
User Interface

The CFEngine Enterprise Mission Portal provides a central dashboard for real-time monitoring, search, and reporting for immediate visibility into your environment’s actual vs desired state. You can also use Mission Portal to set individual and group alerts and track system events that make you aware of specific infrastructure changes.

Dashboard

Scalability

CFEngine Enterprise has a simple distributed architecture that scales with minimal resource consumption. Its pull-based system eliminates the need for server-side processing, which means that a single policy server can concurrently serve up to 5,000 nodes doing 5 minute runs with minimal hardware requirements.

Configurable Data Feeds

The CFEngine Enterprise Mission Portal provides System Administrators and Infrastructure Engineers with detailed information about the actual state of the IT infrastructure and how that compares with the desired state.

Federation and SQL Reporting

CFEngine Enterprise has the ability to create federated structures, in which parts of organizations can have their own configuration policies, while at the same time the central IT organization may impose some policies that are more global in nature.

Monitoring and reporting

The CFEngine Enterprise Mission Portal contains continual reporting that details compliance with policies, repairs and any failures of hosts to match their desired state.

Role-based access control

Users can be assigned roles that limit their access levels throughout the Mission Portal.


High Availability

Overview

Although CFEngine is a distributed system, with decisions made by autonomous agents running on each node, the hub can be viewed as a single point of failure. In order to be able to play both roles that hub is responsible for - policy serving and report collection - High Availability feature was introduced in 3.6.2. Essentially it is based on well known and broadly used cluster resource management tools - corosync and pacemaker as well as PostgreSQL streaming replication feature.

Design

CFEngine High Availability is based on redundancy of all components, most importantly the PostgreSQL database. Active-passive PostgreSQL database configuration is the essential part of High Availability feature. As PostgreSQL supports different replication methods and active-passive configuration schemes, it doesn't provide out-of-the-box database failover-failback mechanism. To support the latter one well known cluster resources management solution based on Linux-HA project has been selected.

Overview of CFEngine High Availability is shown in the diagram below.

HASetup

One hub is the active hub, while the other serves the role of a passive hub and is a fully redundant instance of the active one. If the passive host determines the active host is down, it will be promoted to active and will start serving the Mission Portal, collect reports and serve policy.

Corosync and pacemaker

Corosync and pacemaker are well known and broadly used mechanisms supporting cluster resource management. For CFEngine hub needs those are configured so that are managing PostgreSQL database and one or more IP addresses shared over the nodes in the cluster. In the ideal configuration one link managed by corosync/pacemaker is dedicated for PostgreSQL streaming replication and one for accessing Mission Portal so that once failover happens the change of active-passive roles and failover transition is transparent for end user. He can still use the same shared IP address to log in to the Mission Portal or use against API queries.

PostgreSQL

For best performance, PostgreSQL streaming replication has been selected as database replication mode. It provides capability of shipping WAL files from active server to all standby database servers. This is a PostgreSQL 9.0 and above feature allowing continuous recovery and almost immediate visibility of data inserted to primary server by the standby. For more information about PostgreSQL streaming replication please see this.

CFEngine

In a High Availability setup all the clients are aware of existence of more than one hub. Current active hub is selected as a policy server and policy fetching and report collection is done by the active hub. One of the differences comparing to single-hub installation is that instead of having one policy server, clients have a list of hubs where they should fetch policy and initiate report collection if using call collect. Also after bootstrapping to either active or passive hub clients are implicitly redirected to active one. After that trust is established between the client and both active and passive hub so that all clients are capable to communicate with both. This allows transparent transition to passive hub once fail-over is happening, as all the clients have already established trust with passive hub as well.

Mission Portal

Mission Portal in 3.6.2 has a new indicator whitch shows the status of the High Availability configuration.

HAHealth

High Availability status is constantly monitored so that once some malfunction is discovered the user is notified about the degraded state of the system. Besides simple visualization of High Availability, the user is able to get detailed information regarding the reason for a degraded state, as well as when data was last reported from each hub. This gives quite comprehensive knowledge and overview of the whole setup.

HADegraded

HADegradedDetails

Inventory

There are also new Mission Portal inventory variables indicating the IP address of the active hub instance and status of High Availability installation on each of hubs. Looking at inventory reports is especially helpful to diagnose any problems when High Availability is reported as degraded.

HAInventory

CFEngine High Availability installation

Existing CFEngine Enterprise installations can upgrade their single-node hub to a High Availability system in version 3.6.2. Detailed instruction how to upgrade from single hub to High Availability or how to install CFEngine High Availability from scratch can be found here.


Installation Guide

Overview

This tutorial is describing the installation steps of the CFEngine High Availability feature. It is suitable for both upgrading existing CFEngine installations to HA and for installing HA from scratch. Before starting installation we strongly recommend reading the CFEngine High Availability overview. More detailed information can be found here.

Installation procedure

As with most High Availability systems, setting it up requires carefully following a series of steps with dependencies on network components. The setup can therefore be error-prone, so if you are a CFEngine Enterprise customer we recommend that you contact support for assistance if you do not feel 100% comfortable of doing this on your own.

Please also make sure you have a valid license for the passive hub so that it will be able to handle all your CFEngine clients in case of failover.

Hardware configuration and OS pre-configuration steps
  • CFEngine 3.6.2 (or later) hub package for RHEL6 or CentOS6.
  • We recommend selecting dedicated interface used for PostgreSQL replication and optionally one for heartbeat.
  • We recommend having one shared IP address assigned for interface where MP is accessible (optionally) and one where PostgreSQL replication is configured (mandatory).
  • Both active and passive hub machines must be configured so that host names are different.
  • Basic hostname resolution works (hub names can be placed in /etc/hosts or DNS configured).
Example configuration used in this tutorial

In this tutorial we use the following network configuration:

  • Two nodes, one acting as active (node1) and one acting as passive (node2).
  • Optinally a third node (node3) used as a database backup for offsite replication.
  • Each node having three NICs so that eth0 is used for heartbeat, eth1 is used for PostgreSQL replication and eth2 is used for MP and bootstrapping clients.
  • IP addresses configured as follows:
Node eth0 eth1 eth2
node1 192.168.0.10 192.168.10.10 192.168.100.10
node2 192.168.0.11 192.168.10.11 192.168.100.11
node3 (optional) --- 192.168.10.12 192.168.100.12
cluster shared --- 192.168.10.100 192.168.100.100

Detailed network configuration is shown on the picture below:

HAGuideNetworkSetup

Install cluster management tools

Before you begin you should have corosync (version 1.4.1 or higher) and pacemaker (version 1.1.10-14.el6_5.3 or higher) installed on both nodes. For your convenience we also recommend having pcs installed. Detailed instructions how to install and set up all components are accessible here and here. Please also note that for RHEL 6, additional components might be needed to create the cluster when using the recommendation from Red Hat. One of those components is cman.

Once pacemaker and corosync are successfully installed on both nodes please follow steps below to set up it as needed by CFEngine High Availability. Please note that most of those instructions follow the method recommended by the Red Hat High Availability project.

In order to operate cluster, proper fencing must be configured but description how to fence cluster and what mechanism use is out of the scope of this document. For reference please use following guide.

IMPORTANT: please carefully follow the indicators describing if the given step should be performed on active, passive or both nodes.

  1. Make sure that the hostnames of all nodes nodes are node1, node2 and node3 respectively. Running the command uname -n | tr '[A-Z]' '[a-z]' should return the correct node name. Make sure that the DNS or entries in /etc/hosts are updated so that hosts can be accessed using their host names.

  2. In order to use pcs to manage the cluster, create the user designated to manage the cluster with passwd hacluster on both cluster nodes.

  3. Make sure that pcsd demon is started and configure both nodes so that it will be enabled to boot on startup on each node.

    On RHEL 7: systemctl start pcsd.service; systemctl enable pcsd.service

    On RHEL 6: /etc/init.d/pcsd start; chkconfig pcsd on

  4. Authenticate hacluster user for each node of the cluster. Run the command below only on the active node (node1):

    pcs cluster auth node1 node2
    

    As the result you should see a message similar to one below:

    Username: hacluster
    Password:
    node1: Authorized
    node2: Authorized
    
  5. Create the cluster by running the following command on the active node (node1):

    pcs cluster setup --start --name cfcluster node1 node2
    

    This will create the cluser cfcluster consisting of node1 and node2.

  6. Enable the cluster services to start on boot on both the cluster nodes:

    pcs cluster enable --all
    
  7. At this point the cluster should be up and running without any resource nor STONITH/fencing configured. Running pcs status should print something similar to one below.

    Cluster name: cfcluster
    Last updated: Tue Jul  7 09:29:10 2015
    Last change: Fri Jul  3 08:41:24 2015
    Stack: cman
    Current DC: node1 - partition with quorum
    Version: 1.1.11-97629de
    2 Nodes configured
    0 Resources configured
    
    Online: [ node1 node2 ]
    
    Full list of resources:
    
PostgreSQL configuration

Before starting this make sure that cluster is not running.

  1. Install the CFEngine hub package on both active and passive node.
  2. On the active node (node1) bootstrap the hub to itself so it starts acting as policy server (this step can be skipped if you are upgrading existing installation to High Availability).
  3. Bootstrap the passive node (node2) to the active hub. While bootstrapping, trust between both hubs will be established and keys will be exchanged.
  4. After successfully bootstrapping passive to active, bootstrap the passive node to itself. From now on it will start operate as a hub so that it will be capable of collecting reports and serve policy. Please note that while bootstrapping passive to itself you may see following message:

    "R: This host assumes the role of policy server
    R: Updated local policy from policy server
    R: Failed to start the server
    R: Did not start the scheduler
    R: You are running a hard-coded failsafe. Please use the following command instead.
        "/var/cfengine/bin/cf-agent" -f /var/cfengine/inputs/update.cf
    2015-06-29T17:36:24+0000   notice: Bootstrap to '10.100.100.116' completed successfully!"
    
  5. Configure PostgreSQL on active node:

    1. Create two directories owned by PostgreSQL user: /var/cfengine/state/pg/data/pg_archive and /var/cfengine/state/pg/tmp
    2. Modify postgresql.conf configuration file

      echo "listen_addresses = '*'
      wal_level = hot_standby
      max_wal_senders=5
      wal_keep_segments = 32
      hot_standby = on
      restart_after_crash = off
      
      #not needed but makes failover faster and cluster more stable
      checkpoint_segments = 8
      wal_keep_segments = 8
      archive_mode = on
      archive_command = 'cp %p /var/cfengine/state/pg/pg_arch/%f'
      " >> /var/cfengine/state/pg/data/postgresql.conf
      

      NOTE: In the above configuration, the wal_keep_segments value specifies the minimum number of segments (16 megabytes each) retained in PostgreSQL WAL logs directory in case a standby server needs to fetch them for streaming replication. It should be adjusted to number of clients handled by CFEngine hub and available disk space. In an installation with 1000 clients bootstrapped to the CFEngine hub and assuming passive hub should be able to catch up with the active one after 24 hours break, the value should be set close to 250 (4 GB of additional disk space).

    3. Modify the pg_hba.conf configuration file to enable access to PostgreSQL form listed host. Please note that 192.168.10.10, 192.168.10.11 and 192.168.10.12 are IP addresses of node1, node2 and node3 respectively.

      echo "host replication all 192.168.10.10/32 trust
      host replication all 192.168.10.11/32 trust
      #use one below only in case of having 3rd node used as database backup
      host replication all 192.168.10.12/32 trust
      local replication all trust
      host replication all 127.0.0.1/32 trust
      host replication all ::1/128 trust
      " >> /var/cfengine/state/pg/data/pg_hba.conf
      

      IMPORTANT: The above configuration allows accessing the hub using the cfpostgres user without any authentication from both cluster nodes. For security reasons we strongly advise to create a replication user in PostgreSQL and protect access using a password or certificate. Furthermore, we advise using ssl-secured replication instead of the unencrypted method described here if the hubs are in an untrusted network.

    4. Create the PostgreSQL archive directory (mkdir /var/cfengine/state/pg/pg_arch/) and make the cfpostgres user the owner of it (chown -R cfpostgres:cfpostgres /var/cfengine/state/pg/pg_arch/).

      IMPORTANT: If the archive directory location is different, make sure to change the archive_command entry in postgresql.conf and the restore_command command described later in this document.

    5. Restart the PostgreSQL server so that the configuration changes take effect.

      cd /tmp && su cfpostgres -c "/var/cfengine/bin/pg_ctl -w -D /var/cfengine/state/pg/data stop -m fast"
      cd /tmp && su cfpostgres -c "/var/cfengine/bin/pg_ctl -w -D /var/cfengine/state/pg/data -l /var/log/postgresql.log start"
      
  6. Configure PostgreSQL on the passive node:

    1. Remove the PostgreSQL directory by running rm -rf /var/cfengine/state/pg/data/*.
    2. Do a database backup by running su cfpostgres -c "cd /tmp && /var/cfengine/bin/pg_basebackup -h node1 -U cfpostgres -D /var/cfengine/state/pg/data -X stream -P".
    3. Change the recovery.conf file to indicate that PostgreSQL is running as a hot-standby replica:

      echo "standby_mode = 'on'
      #192.168.10.100 is the shared over cluster IP address of active/master cluster node
      primary_conninfo = 'host=192.168.10.100 port=5432 user=cfpostgres application_name=node2'
      #not needed but recommended for faster failover and more stable cluster operations
      restore_command = 'cp /var/cfengine/state/pg/pg_arch/%f %p'
      " > /var/cfengine/state/pg/data/recovery.conf
      

    NOTE: change host and application_name to point to host names of active and passive nodes respectively.

  7. Start PostgreSQL on the passive node by running the following command:

    cd /tmp && su cfpostgres -c "/var/cfengine/bin/pg_ctl -w -D /var/cfengine/state/pg/data -l /var/log/postgresql.log start"
    

Verify the PostgreSQL status on the passive node by running echo "select pg_is_in_recovery();" | /var/cfengine/bin/psql cfdb. The command should return t, which indicates that the passive node is working in recovery mode.

Verify that the passive node is connected to the active by running the following command on the active node: echo "select * from pg_stat_replication;" | /var/cfengine/bin/psql cfdb. The command should return one entry indicating that node1 is connected to the database in streaming replication mode.

CFEngine configuration

Before starting this step make sure that PostgreSQL is running on both active and passive nodes and that the passive node is being replicated.

  1. Create the HA configuration file on both active and passive nodes:

    echo "cmp_master: PRI
    cmp_slave: HS:async,HS:sync,HS:alone
    cmd: /usr/sbin/crm_attribute -l reboot -n cfpgsql-status -G -q" > /var/cfengine/ha.cfg
    
  2. Create the HA JSON configuration file:

    echo "{
    \"192.168.100.10\":
    {
     \"sha\": \"c14a17325b9a1bdb0417662806f579e4187247317a9e1739fce772992ee422f6\",
     \"internal_ip\": \"192.168.100.10\",
    },
    \"192.168.100.11\":
    {
     \"sha\": \"b492eb4b59541c02a13bd52efe17c6a720e8a43b7c8f8803f3fc85dee7951e4f\",
     \"internal_ip\": \"192.168.100.11\",
    }
    }" > /var/cfengine/masterfiles/cfe_internal/enterprise/ha/ha_info.json
    

    The internal_ip attribute is the IP address of the hub (the one you used to bootstrapped itself to) and sha is the key of the hub. The sha key can be found by running cf-key -s the on the respective hub and match that to the internal_ip.

  3. Modify /var/cfengine/masterfiles/controls/VERSION/def.cf and /var/cfengine/masterfiles/controls/VERSION/update_def.cf to enable HA by uncommenting the line "enable_cfengine_enterprise_hub_ha" expression => "enterprise_edition"; (also make sure to comment or remove the line "enable_cfengine_enterprise_hub_ha" expression => "!any";).

  4. Run cf-agent -f update.cf to make sure that the new policy is copied from masterfiles to inputs on the active node first and then on the passive node. From this point on, PostgreSQL will not be managed by CFEngine but it will be left unmanaged until the pgsql cluster resource is properly configured.

Cluster resource configuration
  1. Configure the shared cluster IP address used for PostgreSQL database replication:

    pcs resource create cfvirtip IPaddr2 ip=192.168.10.100 cidr_netmask=24 --group cfengine
    

    This will create a shared IP address at the appropriate interface (where the 192.168.10.x address already exists).

  2. Verify that the cfvirtip resource is properly configured and running.

    [root@node1] pcs status
    Cluster name: cfcluster
    Last updated: Tue Jul  7 09:29:10 2015
    Last change: Fri Jul  3 08:41:24 2015
    Stack: cman
    Current DC: node1 - partition with quorum
    Version: 1.1.11-97629de
    2 Nodes configured
    1 Resources configured
    
    Online: [ node1 node2 ]
    
    Full list of resources:
    
    Resource Group: cfengine
       cfvirtip   (ocf::heartbeat:IPaddr2):   Started node1
    

    IMPORTANT If fencing is not configured, resources might not be started by default. To enable resource start please run one of the following commands pcs cluster enable --all or pcs resource debug-start cfvirtip.

  3. Add global cluster configuration.

    pcs resource defaults resource-stickiness="INFINITY"
    pcs resource defaults migration-threshold="1"
    
  4. Stop PostgreSQL on all nodes.

  5. Download the latest version of PostgreSQL RA as the default one is known to have a bug while using Master/Slave configuration.

    wget https://raw.github.com/ClusterLabs/resource-agents/a6f4ddf76cb4bbc1b3df4c9b6632a6351b63c19e/heartbeat/pgsql
    cp pgsql /usr/lib/ocf/resource.d/heartbeat/
    chmod 755 /usr/lib/ocf/resource.d/heartbeat/pgsql
    
  6. Create the PostgreSQL resource (recommended way with PostgreSQL archive mode enabled).

    pcs resource create cfpgsql pgsql pgctl="/var/cfengine/bin/pg_ctl" psql="/var/cfengine/bin/psql" pgdata="/var/cfengine/state/pg/data" pgdba="cfpostgres" repuser="cfpostgres" tmpdir="/var/cfengine/state/pg/tmp" rep_mode="async" node_list="node1 node2" primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" master_ip="192.168.10.100" restart_on_promote="true" logfile="/var/log/postgresql.log" config="/var/cfengine/state/pg/data/postgresql.conf" check_wal_receiver=true restore_command="cp /var/cfengine/state/pg/pg_arch/%f %p" op monitor timeout="60s" interval="3s"  on-fail="restart" role="Master" op monitor timeout="60s" interval="4s" on-fail="restart"
    

    Alternatively, you can use following command for minimal setup (no archive enabled):

    pcs resource create cfpgsql pgsql pgctl="/var/cfengine/bin/pg_ctl" psql="/var/cfengine/bin/psql" pgdata="/var/cfengine/state/pg/data" pgdba="cfpostgres" repuser="cfpostgres" tmpdir="/var/cfengine/state/pg/tmp" rep_mode="async" node_list="node1 node2" primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" master_ip="192.168.10.100" restart_on_promote="true" logfile="/var/log/postgresql.log" config="/var/cfengine/state/pg/data/postgresql.conf" op monitor timeout="60s" interval="3s"  on-fail="restart" role="Master" op monitor timeout="60s" interval="4s" on-fail="restart"
    
  7. Configure PostgreSQL to work in Master/Slave (active/standby) mode:

    pcs resource master mscfpgsql cfpgsql master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
    
  8. Group previously configured shared IP address and PostgreSQL cluster resource to make sure both will always run on the same host and add migration rules to make sure that resources will be started and stopped in correct order.

    pcs constraint colocation add cfengine with Master mscfpgsql INFINITY
    pcs constraint order promote mscfpgsql then start cfengine symmetrical=false score=INFINITY
    pcs constraint order demote mscfpgsql then stop cfengine symmetrical=false score=0
    
  9. Verify that constraints configuration is correct.

    [roott@node1] pcs constraint
    Location Constraints:
      Resource: mscfpgsql
        Enabled on: node1 (score:INFINITY) (role: Master)
    Ordering Constraints:
      promote mscfpgsql then start cfengine (score:INFINITY) (non-symmetrical)
      demote mscfpgsql then stop cfengine (score:0) (non-symmetrical)
    Colocation Constraints:
      cfengine with mscfpgsql (score:INFINITY) (rsc-role:Started) (with-rsc-role:Master)
    
  10. After these steps, the cluster should be up and running. To verify, run one of the commands below.

    [root@node1] pcs status
    Cluster name: cfcluster
    Last updated: Tue Jul  7 10:48:21 2015
    Last change: Fri Jul  3 08:41:24 2015
    Stack: cman
    Current DC: node1 - partition with quorum
    Version: 1.1.11-97629de
    2 Nodes configured
    3 Resources configured
    
    Online: [ node1 node2 ]
    
    Full list of resources:
    
    Resource Group: cfengine
        cfvirtip   (ocf::heartbeat:IPaddr2):   Started node1
    Master/Slave Set: mscfpgsql [cfpgsql]
         Masters: [ node1 ]
         Slaves: [ node2 ]
    
    [root@node2 vagrant]# crm_mon -Afr1
    Last updated: Tue Jul  7 10:50:07 2015
    Last change: Tue Jul  7 10:30:03 2015
    Stack: cman
    Current DC: node2 - partition with quorum
    Version: 1.1.11-97629de
    2 Nodes configured
    3 Resources configured
    
    Online: [ node1 node2 ]
    
    Full list of resources:
    
    Resource Group: cfengine
         cfvirtip   (ocf::heartbeat:IPaddr2):   Started node1
    Master/Slave Set: mscfpgsql [cfpgsql]
         Masters: [ node1 ]
         Slaves: [ node2 ]
    
    Node Attributes:
        * Node node1:
        + cfpgsql-data-status               : LATEST
        + cfpgsql-master-baseline           : 000000000B000090
        + cfpgsql-receiver-status           : ERROR
        + cfpgsql-status                    : PRI
        + master-cfpgsql                    : 1000
    
    * Node node2:
        + cfpgsql-data-status               : STREAMING|ASYNC
        + cfpgsql-receiver-status           : normal
        + cfpgsql-status                    : HS:alone
        + master-cfpgsql                    : -INFINITY
    

    IMPORTANT: Please make sure that cfpgsql-status for the active node is reported as PRI and passive as HS:alone or HS:async.

  11. Enjoy your working CFEngine High Availability setup!

Configuring 3rd node as disaster-recovery or database backup (optional)
  1. Install the CFEngine hub package on node which will be used as disaster-recovery or database backup node (node3).

  2. Bootstrap the disaster-recovery node to active node first (establish trust between hubs) and then bootstrap it to itself. At this point hub will be capable of collecting reports and serve policy.

  3. Stop cf-execd and cf-hub processes.

  4. Make sure that PostgreSQL configuration allows database replication connection from 3rd node (see PostgreSQL configuration section, point 5.3 for more details).

  5. Repeat steps 6 - 7 from PostgreSQL configuration to enable and verify database replication connection from 3rd node. Make sure that both second cluster node (passive) and 3rd node (disaster-recovery) are connected to active database node and streaming replication is in progress.

    [root@node1 tmp]# echo "select * from pg_stat_replication;" | /var/cfengine/bin/psql cfdb
    pid  | usesysid |  usename   | application_name |  client_addr  | client_hostname | client_port |         backend_start         |   state   | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state
    ------+----------+------------+------------------+---------------+-----------------+-------------+-------------------------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------
    9252 |       10 | cfpostgres | node2            | 192.168.10.11 |                 |       58919 | 2015-08-24 07:14:45.925341+00 | streaming | 0/2A7034D0    | 0/2A7034D0     | 0/2A7034D0     | 0/2A7034D0      |             0 | async
    9276 |       10 | cfpostgres | node3            | 192.168.10.12 |                 |       52202 | 2015-08-24 07:14:46.038676+00 | streaming | 0/2A7034D0    | 0/2A7034D0     | 0/2A7034D0     | 0/2A7034D0      |             0 | async
    
    (2 rows)
    
  6. Modify HA JSON configuration file to contain information about 3rd node (see CFEngine configuration, point 2). You should have configuration similar to one below:

    [root@node3 masterfiles]# cat /var/cfengine/masterfiles/cfe_internal/enterprise/ha/ha_info.json
    {
     "192.168.100.10":
     {
      "sha": "b1463b08a89de98793d45a52da63d3f100247623ea5e7ad5688b9d0b8104383f",
      "internal_ip": "192.168.100.10",
      "is_in_cluster" : true,
     },
     "192.168.100.11":
     {
      "sha": "b13db51615afa409a22506e2b98006793c1b0a436b601b094be4ee4b32b321d5",
      "internal_ip": "192.168.100.11",
     },
     "192.168.100.12":
     {
      "sha": "98f14786389b2fe5a93dc3ef4c3c973ef7832279aa925df324f40697b332614c",
      "internal_ip": "192.168.100.12",
      "is_in_cluster" : false,
     }
    }
    

    Please note that is_in_cluster parameter is optional for 2 nodes HA clusters and by default is set to true. For 3 nodes setup, the node which is not part od pacemaker/corosync cluster setup MUST be marked with "is_in_cluster" : false configuration parameter.

  7. Start cf-execd process (don't start cf-hub process as this is not needed while manual failover to 3rd node is not performed). Please also note, that during normal operations cf-hub process should not be running on 3rd HA node.

Manual failover to disaster-recovery node
  1. Before starting manual failover process make sure both active and passive nodes are not running.

  2. Verify that PostgreSQL is running on 3rd node and data replication from active node is not in progress. If database is actively replicating data with active cluster node make sure that this process will be finished and no new data will be stored in active database instance.

  3. After verifying that replication is finished and data is synchronized between active database node and replica node (or once node1 and node2 are both down) promote PostgreSQL to exit recovery and begin read-write operations cd /tmp && su cfpostgres -c "/var/cfengine/bin/pg_ctl -c -w -D /var/cfengine/state/pg/data -l /var/log/postgresql.log promote".

  4. In order to make failover process as easy as possible there is "failover_to_replication_node_enabled" class defined both in /var/cfengine/masterfiles/controls/VERSION/def.cf and /var/cfengine/masterfiles/controls/VERSION/update_def.cf. In order to stat collecting reports and serving policy from 3rd node uncomment the line defining mentioned class.

IMPORTANT: Please note that as long as any of the active or passive cluster nodes is accessible by client to be contacted, failover to 3rd node is not possible. If the active or passive node is running and failover to 3rd node is required make sure to disable network interfaces where clients are bootstrapped to so that clients won't be able to access any other node than disaster-recovery.

Troubleshooting
  1. If either the IPaddr2 or pgslq resource is not running, try to enable it first with pcs cluster enable --all. If this is not strting the resources, you can try to run them in debug mode with this command pcs resource debug-start <resource-name>. The latter command should print diagnostics messages on why resources are not started.

  2. If crm_mon -Afr1 is printing errors similar to the below

    [root@node1]# pcs status
    Cluster name: cfcluster
    Last updated: Tue Jul  7 11:27:23 2015
    Last change: Tue Jul  7 11:02:40 2015
    Stack: cman
    Current DC: node1 - partition with quorum
    Version: 1.1.11-97629de
    2 Nodes configured
    3 Resources configured
    
    Online: [ node1 ]
    OFFLINE: [ node2 ]
    
    Full list of resources:
    
     Resource Group: cfengine
         cfvirtip   (ocf::heartbeat:IPaddr2):   Started node1
     Master/Slave Set: mscfpgsql [cfpgsql]
         Stopped: [ node1 node2 ]
    
    Failed actions:
        cfpgsql_start_0 on node1 'unknown error' (1): call=13, status=complete, last-rc-change='Tue Jul  7 11:25:32 2015', queued=1ms, exec=137ms
    

    you can try to clear the errors by running pcs resource cleanup <resource-name>. This should clean errors for the appropriate resource and make the cluster restart it.

    [root@node1 vagrant]# pcs resource cleanup cfpgsql
    Resource: cfpgsql successfully cleaned up
    
    [root@node1 vagrant]# pcs status
    Cluster name: cfcluster
    Last updated: Tue Jul  7 11:29:36 2015
    Last change: Tue Jul  7 11:29:08 2015
    Stack: cman
    Current DC: node1 - partition with quorum
    Version: 1.1.11-97629de
    2 Nodes configured
    3 Resources configured
    
    Online: [ node1 ]
    OFFLINE: [ node2 ]
    
    Full list of resources:
    
     Resource Group: cfengine
         cfvirtip   (ocf::heartbeat:IPaddr2):   Started node1
     Master/Slave Set: mscfpgsql [cfpgsql]
         Masters: [ node1 ]
         Stopped: [ node2 ]
    
  3. After cluster crash make sure to always start the node that should be active first, and then the one that should be passive. If the cluster is not running on the given node after restart you can enable it by running the following command:

    [root@node2]# pcs cluster start
    Starting Cluster...
    

Hub Administration

Find out how to perform common hub administration tasks like resetting admin credentials, or using custom SSL certificates.


Re-installing Enterprise Hub

Sometimes it is useful to re-install the hub while still preserving existing trust and licensing. To preserve trust the $(sys.workdir)/ppkeys directory needs to be backed up and restored. To preserve enterprise licensing $(sys.workdir/license.dat) and $(sys.workdir)/licenses/. should be backed up.

Note: Depending on how and when your license was installed $(sys.workdir/licenses.dat) and or $(sys.workdir)/licenses/. may not exist. That is ok.

Warning: This process will not preserve any Mission Portal specific configuration except for the upstream VCS repository configuration. LDAP, roles, dashboards, and any other configuration done within Mission Portal will be lost.

This script in core/contrib serves as an example.


Policy Deployment

By default CFEngine policy is distributed from /var/cfengine/masterfiles on the policy server. It is common (and recommended) for masterfiles to be backed with a version control system (VCS) such as git or subversion. This document details usage with git, but the tooling is desinged to be flexible and easily modified to support any upstream versioning system.

CFEngine Enterprise ships with tooling to assist in the automated deployment of policy from a version control system to /var/cfengine/masterfiles on the hub.

Ensure policy in upstream repository is current

This is critical. When you deploying policy, you will overwrite your current /var/cfengine/masterfiles. So take the current contents thereof and make sure they are in the Git repository you chose in the previous step.

For example, if you create a new repository in GitHub by following the instructions from https://help.github.com/articles/create-a-repo, you can add the contents of masterfiles to it with the following commands (assuming you are already in your local repository checkout):

cp -r /var/cfengine/masterfiles/* .
git add *
git commit -m 'Initial masterfiles check in'
git push origin master
Configure the upstream VCS

To configure the upstream repository. You must provide the uri, credentials (passphraseless ssh key) and the branch to deploy from.

Configuring upstream VCS via Mission Portal

In the Mission Portal VCS integration panel. To access it, click on "Settings" in the top-left menu of the Mission Portal screen, and then select "Version control repository".

Settings menu

VCS settings screen

Configuring upstream VCS manually

The upstream VCS can be configured manually by modifying /opt/cfengine/dc-scripts/params.sh

Manually triggering a policy deployment

After the upstream VCS has been configured you can trigger a policy deployment manually by defining the cfengine_internal_masterfiles_update for a run of the update policy.

For example:

[root@hub ~]# cf-agent -KIf update.cf --define cfengine_internal_masterfiles_update
    info: Executing 'no timeout' ... '/var/cfengine/httpd/htdocs/api/dc-scripts/masterfiles-stage.sh'
    info: Command related to promiser '/var/cfengine/httpd/htdocs/api/dc-scripts/masterfiles-stage.sh' returned code defined as promise kept 0
    info: Completed execution of '/var/cfengine/httpd/htdocs/api/dc-scripts/masterfiles-stage.sh'

This is useful if you would like more manual control of policy releases.

Configuring automatic policy deployments

To configure automatic deployments simply ensure the cfengine_internal_masterfiles_update class is defined on your policy hub.

Configuring automatic policy deployments with the augments file

Create def.json in the root of your masterfiles with the following content:

{
  "classes": {
    "cfengine_internal_masterfiles_update": [ "hub" ]
    }
}
Configuring automatic policy deployments with policy

Simply edit bundle common update_def in controls/update_def.cf.

bundle common update_def
{
# ...
  classes:
# ...

    "cfengine_internal_masterfiles_update" expression => "policy_server";
# ...
}
Troubleshooting policy deployments

Before policy is deployed from the upstream VCS to /var/cfengine/masterfiles the policy is first validated by the hub. If this validation fails the policy will not be deployed.

For example:

[root@hub ~]# cf-agent -KIf update.cf --define cfengine_internal_masterfiles_update
    info: Executing 'no timeout' ... '/var/cfengine/httpd/htdocs/api/dc-scripts/masterfiles-stage.sh'
   error: Command related to promiser '/var/cfengine/httpd/htdocs/api/dc-scripts/masterfiles-stage.sh' returned code defined as promise failed 1
    info: Completed execution of '/var/cfengine/httpd/htdocs/api/dc-scripts/masterfiles-stage.sh'
R: Masterfiles deployment failed, for more info see '/var/cfengine/outputs/dc-scripts.log'
   error: Method 'cfe_internal_masterfiles_stage' failed in some repairs
   error: Method 'cfe_internal_update_from_repository' failed in some repairs
    info: Updated '/var/cfengine/inputs/cfe_internal/update/cfe_internal_update_from_repository.cf' from source '/var/cfengine/masterfiles/cfe_internal/update/cfe_internal_update_from_repository.cf' on 'localhost'

Policy deployments are logged to /var/cfengine/outputs/dc-scripts.log. The logs contain useful information about the failed deployment. For example here I can see that there is a syntax error in promises.cf near line 14.

[root@prihub ~]# tail -n 5 /var/cfengine/outputs/dc-scripts.log
/opt/cfengine/masterfiles_staging_tmp/promises.cf:14:46: error: Expected ',', wrong input '@(inventory.bundles)'
                          @(inventory.bundles),
                                             ^
   error: There are syntax errors in policy files
The staged policies in /opt/cfengine/masterfiles_staging_tmp could not be validated, aborting.: Unknown Error

Public key distribution

How can I arrange for the hosts in my infrastructure to trust a new key?

If you are deploying a new hub, or authorizing a non-hub to copy files from peers you will need to establish trust before communication can be established.

In order for trust to be established each host must have the public key of the other host stored in $(sys.ppkeys) named for the public key sha.

For example, we have 2 hosts. host001 with public key sha SHA=917962161107efaed9610de3e034085373142f577fb7e7b9bddec2955b748836 and hub with public key sha SHA=af00250085306c68bb6d5f489f0239e2d7ff8a1f53f2d00e77c9ad2044309dfe. For trust to be established host001 must have $(sys.workdir)/ppkeys/root-SHA=af00250085306c68bb6d5f489f0239e2d7ff8a1f53f2d00e77c9ad2044309dfe.pub and hub must have $(sys.workdir)/ppkeys/root-SHA=917962161107efaed9610de3e034085373142f577fb7e7b9bddec2955b748836.pub. The files must be root owned with write access restricted to the owner (644 or less).

This policy shows how public keys can be stored in a central location on the policy server and automatically installed on all hosts.

bundle agent trust_distkeys
#@ brief Example public key distribution
{
  meta:

      "tags" slist => { "autorun" };

  vars:

      "keystore"
        comment => "We want all hosts to trust these hosts because they perform
                    critical functions like policy serving.",
        string => ifelse( isvariable( "def.trustkeys[keystore])" ), "$(def.trustkeys[keystore])",
                                      "distkeys");

  files:

      "$(sys.workdir)/ppkeys/."
        handle => "trust_distkeys",
        comment => "We need trust all the keys stored in `$(keystore)` on
                   `$(sys.policy_hub)` so that we can communicate with them
                   using the CFEngine protocol.",
        copy_from => remote_dcp( $(keystore), $(sys.policy_hub) ),
        depth_search => basedir,
        file_select => public_keys,
        perms => mog( 644, root, root );
}

bundle server share_distkeys
#@ brief Share the directory containing public keys we need to distribute
{
  access:

    (policy_server|am_policy_hub)::

      "/var/cfengine/distkeys/"
        admit_ips => { "0.0.0.0/0" },
        shortcut => "distkeys",
        handle => "access_share_distkeys",
        comment => "This directory contains public keys of hosts that should be
                    trusted by everyone.";

}

body depth_search basedir
#@ brief Search the files in the top level of the source directory
{
      include_basedir => "true";
      depth => "1";
}

body file_select public_keys
#@ brief Select plain files matching public key file naming patterns
{
        # root-SHA=abc123.pub
        leaf_name => { "\w+-(SHA|MD5)=[[:alnum:]]+\.pub" };
        file_types => { "plain" };

        file_result => "leaf_name.file_types";
}

Configure a custom LDAP port

Mission Portals User settings and preferences provides a radio button encryption. This controls the encryption and the port to connect to.

Ldap Settings

If you want to configure LDAP authentication to use a custom port you can do so via the Status and Setting REST API.

Status and Settings REST API This example shows using jq to preserve the existing settings and update the SSL LDAP port to 3269.

Note: The commands are run as root on the hub, and the hubs self signed certificate is used to connect to the API over https. An accessToken must be retrieved from /var/cfengine/httpd/htdocs/ldap/config/settings.php.

[root@hub ~]# export CACERT="/var/cfengine/httpd/ssl/certs/hub.cert"
[root@hub ~]# export API="https://hub/ldap/settings"
[root@hub ~]# export AUTH_HEADER="Authorization:<accessToken from settings.php as mentioned above>"
[root@hub ~]# export CURL="curl --silent --cacert ${CACERT} -H ${AUTH_HEADER} ${API}"
[root@hub ~]# ${CURL} | jq '.data'
{
  "domain_controller": "ldap.jumpcloud.com",
  "custom_options": {
    "24582": 3
  },
  "version": 3,
  "group_attribute": "",
  "admin_password": "Password is set",
  "base_dn": "ou=Users,o=5888df27d70bea3032f68a88,dc=jumpcloud,dc=com",
  "login_attribute": "uid",
  "port": 2,
  "use_ssl": true,
  "use_tls": false,
  "timeout": 5,
  "ldap_filter": "(objectClass=inetOrgPerson)",
  "admin_username": "uid=missionportaltesting,ou=Users,o=5888df27d70bea3032f68a88,dc=jumpcloud,dc=com"
}

[root@hub ~]# ${CURL} -X PATCH -d '{"port":3269}'
{"success":true,"data":"Settings successfully saved."}

Reset administrative credentials

The default admin user can be reset to defaults using the following SQL.

cfsettings-setadminpassword.sql:

UPDATE "users"
    SET password='SHA=aa459b45ecf9816d472c2252af0b6c104f92a6faf2844547a03338e42e426f52',
        salt='eWAbKQmxNP',
        name='admin',
        email='admin@organisation.com',
        active='1',
        roles='{admin,cf_remoteagent}',
            changetimestamp = now()
    WHERE username='admin';
INSERT INTO "users" ("username", "password", "salt", "name", "email", "external", "active", "roles", "changetimestamp")
       SELECT 'admin', 'SHA=aa459b45ecf9816d472c2252af0b6c104f92a6faf2844547a03338e42e426f52', 'eWAbKQmxNP', 'admin',  'admin@organisation.com', false, '1',  '{admin,cf_remoteagent}', now()
       WHERE NOT EXISTS (SELECT 1 FROM users WHERE username='admin');

To reset the CFEngine admin user run the following sql as root on your hub

root@hub:~# psql cfsettings < cfsettings-setadminpassword.sql

Backup and Restore

With policy stored in version control there are few things that should be preserved in your backup and restore plan.

Hub Identity

CFEngines trust model is based on public and private key exchange. In order to re-provision a hub and for remote agents to retain trust the hubs key pair must be preserved and restored.

Include $(sys.workdir)/ppkeys/localhost.pub and $(sys.workdir)ppkeys/localhost.priv in your backup and restore plan.

Note: This is the most important thing to backup.

Hub License

Enterprise hubs will collect for up to the licensed number of hosts. When re-provisioning a hub you will need the license that matches the hub identity in order to be able to collect reports for more than 25 hosts.

Include $(sys.workdir)/licenses in your backup plan.

Hub Databases

Data collected from remote hosts and configuration information for Mission Portal is stored on the hub in PostgreSQL which can be backed up and restored using standard tools.

If you wish to rebuild a hub and restore the history of policy outcomes you must backup and restore.

Host Data

cfdb stores data related to policy runs on your hosts for example host inventory.

Backup:

# pg_dump -Fc cfdb > cfdb.bak

Restore:

# pg_restore -Fc cfdb.bak
Mission Portal

cfmp and cfsettings store Mission Portals configuration information for example shared dashboards.

Backup:

# pg_dump -Fc cfmp > cfmp.bak
# pg_dump -Fc cfsettings > cfsettings.bak

Restore:

# pg_restore -Fc cfmp.bak
# pg_restore -Fc cfsettings.bak

Custom SSL Certificate

When first installed a self-signed ssl certificate is automatically generated and used to secure Mission Portal and API communications. You can change this certificate out with a custom one by replacing /var/cfengine/httpd/ssl/certs/<hostname>.cert and /var/cfengine/httpd/ssl/private/<hostname>.cert where hostname is the fully qualified domain name of the host.

After installing the certificate please make sure that the certificate at /var/cfengine/httpd/ssl/certs/<hostname>.cert is world-readable on the hub. This is needed because the Mission Portal web application needs to access it directly. You can test by verifying you can access the certificate with a unprivileged user account on the hub.

You can get the fully qualified hostname on your hub by running the following commands.

[root@hub ~]# cf-promises --show-vars=default:sys\.fqhost
default:sys.fqhost                       hub                                                          inventory,source=agent,attribute_name=Host name
[root@hub ~]# hostname -f
hub

Adjusting Schedules

Set cf-execd agent execution schedule

By default cf-execd is configured to run cf-agent every 5 minutes. This can be adjusted by tuning the schedule in body executor control. In the Masterfiles Policy Framework body executor control can be found in controls/cf_execd.cf

Set cf-hub hub_schedule

cf-hub the CFEngine Enterprise report collection component has a hub_schedule defined in body hub control which also defaults to a 5 minute schedule. It can be adjusted to control how frequently hosts should be collected from. In the Masterfiles Policy Framework body hub control can be found in controls/cf_hub.cf

Note: Mission Portal has an "Unreachable host threshold" that defaults to 15 minutes. When a host has not been collected from within this window the host is added to the "Hosts not reporting" report. When adjusting the cf-hub hub_schedule consider adjusting the Unreachable host threshold proportionally. For example, if you change the hub_schedule to execute only once every 15 minutes, then the Unreachable host threshold should be adjusted to 45 minutes (2700 seconds).

Set Unreachable host threshold via API

Note: This example uses jq to filter API results to only the relevant values. It is a 3rd party tool, and not shipped with CFEngine.

Here we create a JSON payload with the new value for the Unreachable host threshold (blueHostHorizon). We post the new settings and finally query the API to validate the change in settings.

[root@hub ~]# echo '{ "blueHostHorizon": 2700 }' > payload.json
[root@hub ~]# cat payload.json
{ "blueHostHorizon": 2700 }
[root@hub ~]# curl -u admin:admin http://localhost:80/api/settings -X POST -d @./payload.json
[root@hub ~]# curl -s -u admin:admin http://localhost:80/api/settings/ | jq '.data[0]|.blueHostHorizon'
2700

Enable plain http

By default HTTPS is enforced by redirecting any non secure connection requests.

If you would like to enable plain HTTP you can do so by defining cfe_enterprise_enable_plain_http from an augments file.

For example, simply place the following inside def.json in the root of your masterfiles.

{
  "classes": {
    "cfe_enterprise_enable_plain_http": [ "any" ]
    }

}

Lookup License Info

Information about the currently issued license can be obtained from the About section in Mission Portal web interface or from the command line as shown here.

Note: When the CFEngine Enterprise license expires report collection is limited. No agent side functionality is changed. However if you are using functions or features that rely on information collected by the hub, that information will no longer be a reliable source of data.

Get license info via API

Run from the hub itself.

$ curl -u admin http://localhost/api/
Get license info from cf-hub

Run as root from the hub itself.

# cf-hub -Fvn | grep -i expiring
2016-07-11T15:54:23+0000  verbose: Found 25 CFEngine Enterprise licenses, expiring on 2222-12-25 for FREE ENTERPRISE - http://cfengine.com/terms for terms

Regenerate Self Signed SSL Certificate

When first installed a self-signed ssl certificate is automatically generated and used to secure Mission Portal and API communications. You can regenerate this certificate by running cfe_enterprise_selfsigned_cert bundle with the _cfe_enterprise_selfsigned_cert_regenerate_cert class defined. This can be done by running the following commands as root on the hub.

# cf-agent --no-lock --inform \
         --bundlesequence cfe_enterprise_selfsigned_cert \
         --define _cfe_enterprise_selfsigned_cert_regenerate_certificate

Custom LDAPs Certificate

To use a custom LDAPs certificate install it into your hubs operating system.

Note you can use the LDAPTLS_CACERT environment variable to use a custom certificate for testing with ldapsearch before it has been installed into the system.

[root@hub]:~# env LDAPTLS_CACERT=/tmp/MY-LDAP-CERT.cert.pem ldapsearch -xLLL -H ldaps://ldap.example.local:636 -b "ou=people,dc=example,dc=local"

Extending Mission Portal

Custom pages requiring authenticated users

Mission Portal can render static text files (html, sql, txt, etc ...) for users which are logged in.

How to use

Upload files to $(sys.workdir)/httpd/htdocs/application/modules/files/static_files on your hub. Access the content using the url https://hub/files/view/file_name.html, where file_name.html is the name of a file. Please note, uploaded files should have read permission for cfapache user.

Custom help menu entries

The help menu Mission portal help menu. It can be useful if you would like to make extra content like documentation easily avilable to users.

How to use

Upload html files into $(sys.workdir)/httpd/htdocs/application/views/extraDocs/ on your hub. Menu items will appear named for each html file where underscores are replaced with spaces. Files must be readable by the cfapache user.

Example

File test_documentation.html was uploaded to the directory specified above. Extended menu

Mission Portal Style

Use the following structure in your HTML to style the page the same as the rest of Mission Portal.

<div class="contentWrapper help">
    <div class="pageTitle">
        <h1>PAGE TITLE</h1>
    </div>

     <!-- CONTENT --->
</div>

Install and Get Started

Installation

The General Installation instructions provide the detailed steps for installing CFEngine, which are generally the same steps to follow for CFEngine Enterprise, with the exception of license keys (if applicable), and also some aspects of post-installation and configuration.

Installing Enterprise Licenses

Before you begin, you should have your license key, unless you only plan to use the free 25 node license. The installation instructions will be provided with the key.

Post-Install Configuration
Change Email Setup After CFEngine Enterprise Installation

For Enterprise 3.6 local mail relay is used, and it is assumed the server has a proper mail setup.

The default FROM email for all emails sent from the Mission Portal is admin@organization.com. This can be changed on the CFE Server in /var/cfengine/httpd/htdocs/application/config/appsettings.php:$config['appemail'].

Version your policies

Consider enabling the built-in version control of your policies as described in Version Control and Configuration Policy

Whether you do or not, please put your policies in some kind of backed-up VCS. Losing work because of "fat fingering" rm commands is very, very depressing.

Configure collection for monitoring data

Monitoring allows you to sample a metric and assess its value across your hosts over time. Collection of monitoring information is disabled by default. Metrics must match monitoring_include in the appropriate report_data_select body. The Masterfiles Policy Framework uses body report_data_select default_data_select_policy_hub to specify metrics that should be collected from policy hubs and default_data_select_host to specify metrics that should be collected from non hubs.

For example:

To collect all metrics from hubs:

body report_data_select default_data_select_policy_hub
# @brief Data to collect from policy servers by default
#
# By convention variables and classes known to be internal, (having no
# reporting value) should be prefixed with an underscore. By default the policy
# framework explicitly excludes these variables and classes from collection.
{
      metatags_include => { "inventory", "report" };
      metatags_exclude => { "noreport" };
      promise_handle_exclude => { "noreport_.*" };
      monitoring_include => { ".*" };
}

To collect cpu, loadavg , diskfree, swap_page_in, cpu_utilization, swap_utilization, and memory_utilization from non hubs:

body report_data_select default_data_select_host
##### @brief Data to collect from remote hosts by default
#####
##### By convention variables and classes known to be internal, (having no
##### reporting value) should be prefixed with an underscore. By default the policy
##### framework explicitly excludes these variables and classes from collection.
{
      metatags_include => { "inventory", "report" };
      metatags_exclude => { "noreport" };
      promise_handle_exclude => { "noreport_.*" };
      monitoring_include => {
                              "cpu",
                              "loadavg",
                              "diskfree",
                              "swap_page_in",
                              "cpu_utilization",
                              "swap_utilization",
                              "memory_utilization",
                              };
}

Review settings

See the Masterfiles Policy Framework for a full list of all the settings you can configure.


User Interface

The challenge in engineering IT infrastructure, especially as it scales vertically and horizontally, is to recognize the system components, what they do at any given moment in time (or over time), and when and how they change state.

CFEngine Enterprise's data collection service, the cf-hub collector, collects, organizes, and stores data from every host. The data is stored primarily in a PostgreSQL database.

CFEngine Enterprise's user interface, the Mission Portal makes that data available to authorized users as high level reports or alerts and notifications. The reports can be designed in a GUI report builder or directly with SQL statements passed to PostgreSQL.

Dashboard

The Mission Portal dashboard allows users to create customized summaries showing the current state of the infrastructure and its compliance with deployed policy.

The dashboard contains informative widgets that you can customize to create alerts. All notifications of alert state changes, e.g. from OK to not-OK, are stored in an event log for later inspection and analysis.

Alert widgets

Enterprise UI Alerts

Alerts can have three different severity level: low, medium and high. These are represented by yellow, orange and red rings respectively, along with the percentage of hosts alerts have triggered on. Hovering over the widget will show the information as text in a convenient list format.

Enterprise UI Alerts

You can pause alerts during maintenance windows or while working on resolving an underlying issue to avoid unnecessary triggering and notifications.

Enterprise UI Alerts

Alerts can have three different states: OK, triggered, and paused. It is easy to filter by state on each widget's alert overview.

Find out more: Alerts and Notifications

Changes widget

The changes widget helps to visualize the number of changes (promises repaired) made by cf-agent.

Dashboard Changes widget

Event log

The event log on the dashboard is filtered to show only information relevant based on the widgets present. It shows when alerts are triggered and cleared and when hosts are bootstrapped or decommissioned.

Dashboard Event log

Host count widget

The hosts count widget helps to visualize the number of hosts bootstrapped to cfengine over time.

Dashboard Host count

Hosts and Health

CFEngine collects data on promise compliance, and sorts hosts according to 3 different categories: erroneous, fully compliant, and lacking data.

Find out more: Hosts and Health

Reporting

Inventory reports allow for quick reporting on out-of-the-box attributes. The attributes are also extensible, by tagging any CFEngine variable or class, such as the role of the host, inside your CFEngine policy. These custom attributes will be automatically added to the Mission Portal.

Enterprise UI Reporting

You can reduce the amount of data or find specific information by filtering on attributes and host groups. Filtering is independent from the data presented in the results table: you can filter on attributes without them being presented in the table of results.

Enterprise UI Reporting

Add and remove columns from the results table in real time, and once you're happy with your report, save it, export it, or schedule it to be sent by email regularly.

Enterprise API Overview

Find out more: Reporting

Follow along in the custom inventory tutorial or read the MPF policy that provides inventory.

Sharing

Dashboards, Host categorization views, and Reports can be shared based on role.

Please note that the logic for sharing based on roles is different than the logic that controls which hosts a given role is allowed access to data for. When a Dashboard, Host categorization, or report is shared with a role, anyone having that role is allowed to access it. For example if a dashboard is shared with the reporting and admin roles users with either the role reporting or the role admin are allowed access.

For example:

  • user1 has only the reporting role.
  • admin has the admin role.

If the admin user creates a new dashboard and shares it with the reporting role, then any user (including user1 ) having the reporting role will be able to subscribe to the new dashboard. Additionally the dashboard owner in this case admin also has access to the custom dashboard.

Monitoring

Monitoring allows you to get an overview of your hosts over time.

Find out more: Monitoring

Settings

A variety of CFEngine and system properties can be changed in the Settings view.

Find out more: Settings


Settings

A variety of CFEngine and system properties can be changed in the Settings view.

Opening Settings

Opening Settings

Settings are accessible from any view of the mission portal, from the drop down in the top right hand corner.

Preferences

Preferences

User settings and preferences allows the CFEngine Enterprise administrator to change various options, including:

  • Turn on or off RBAC
  • Unreachable host threshold
  • Number of samples used to identify a duplicate identity
  • Log level
  • Customize the user experience with the organization logo
User Management

User Management

User management is for adding or adjusting CFEngine Enterprise UI users, including their name, role, and password.

Role Management

Role Management

Roles limit access to host data and access to shared assets like saved reports and dashboards.

Roles limit access to which hosts can be seen based on the classes reported by the host. For example if you want to limit a users ability to report only on hosts in the "North American Data Center" you could setup a role that includes only the location_nadc class.

When multiple roles are assigned to a user, the user can access only resources that match the most restrictive role across all of their roles. For example, if you have the admin role and a role that matches zero hosts, the user will not see any hosts in Mission Portal. A shared report will only be accessible to a user if the user has all roles that the report was restricted to.

In order to access a shared reports or dashboard the use must have all roles that the report or dashboard was shared with.

In order to see a host, none of the classes reported by the host can match the class exclusions from any role the user has.

Users without a role will not be able to see any hosts in Mission Portal.

Role suse: - Class include: SUSE - Class exclude: empty

Role cfengine_3: - Class include: cfengine_3 - Class exclude: empty

Role no_windows - Class include: cfengine_3 - Class exclude: windows

Role windows_ubuntu - Class include: windows - Class include: ubuntu - Class exclude: empty

User one has role SUSE.

User two has roles no_windows and cfengine_3.

User three has roles windows_ubuntu and no_windows.

A report shared with SUSE and no_windows will not be seen by any of the listed users.

A report shared with no_windows and cfengine_3 will only be seen by user two.

A report shared with SUSE will be seen by user one.

User one will only be able to see hosts that report the SUSE class.

User two will be able to see all hosts that have not reported the windows class.

User three will only be able to see hosts that have reported the ubuntu class.

Predefined Roles:

  • admin - The admin role can see everything and do anything.
  • cf_remoteagent - This role allows execution of cf-runagent.

Default Role:

To set the default role, click Settings -> User management -> Roles. You can then select which role will be the default role for new users.

DefaultRoleSelecting

Behaviour of Default Role:

Any new users created in Mission Portal's local user database will have this new role assigned.

Users authenticating through LDAP (if you have LDAP configured in Mission Portal) will get this new role applied the first time they log in.

Note that the default role will not have any effect on users that already exist (in Mission Portal's local database) or have already logged in (when using LDAP).

In effect this allows you to set the default permissions for new users (e.g. which hosts a user is allowed to see) by configuring the access for the default role.

AddNewUser

Manage Apps

Manage Apps

Application settings can help adjust some of CFEngine Enterprise UI app features, including the order in which the apps appear and their status (on or off).

Version Control Repository

Version Control Repository

The repository holding the organization's masterfiles can be adjusted on the Version Control Repository screen.

Host Identifier

Host Identifier

Host identity for the server can be set within settings, and can be adjusted to refer to the FQDN, IP address, or an unqualified domain name.

Mail settings

Mail settings

Configure outbound mail settings:

  • Default from email : Email address that Mission Portal will use by default when sending emails.

  • Mail protocol : Use the system mailer (Sendmail) or use an SMTP server.

  • Max email attachment size (MB) : mails sent by Mission Portal with attachments exceeding this will have the attachment replaced with links to download the large files.

Authentication settings

Authentication settings

Mission portal can authenticate against an external directory.

Special Notes:

  • LDAP API Url refers to the API cfengine uses internally for authentication. Most likely you will not alter the default value.

  • LDAP filter must be supplied.

  • LDAP Host refers is the IP or Hostname of your LDAP server.

  • LDAP bind username should be the username used to bind and search the LDAP directory. It must be provided in distinguished name format.

  • Default roles for users is configured under Role Management.

See Also: LDAP authentication REST API

About CFEngine

About CFEngine

The About CFEngine screen contains important information about the specific version of CFEngine being used, license information, and more.


Hosts and Health

Host Compliance

CFEngine collects data on promise compliance. Each host is in one of two groups: out of compliance or fully compliant.

  • A host is considered out of compliance if less than 100% of its promises were kept.
  • A host is considered fully compliant if 100% of its promises were kept.

You can look at a specific sub-set of your hosts by selecting a category from the menu on the left.

Host Info

Here you will find extensive information on single hosts that CFEngine detects automatically in your environment. Since this is data gathered per host, you need to select a single host from the menu on the left first.

Host Health

Hosts

You can get quick access to the health of hosts, including direct links to reports, from the Health drop down at the top of every Enterprise UI screen. Hosts are listed as unhealthy if:

  • the hub was not able to connect to and collect data from the host within a set time interval (unreachable host). The time interval can be set in the Mission Portal settings.
  • the policy did not get executed for the last three runs. This could be caused by cf-execd not running on the host (scheduling deviation) or an error in policy that stops its execution. The hub is still able to contact the host, but it will return stale data because of this deviation.
  • two or more hosts use the same key. This is detected if the IP address tied to a CFEngine key has changed in the last three scheduled runs. The number of scheduled runs that cause the unhealthy status is configurable in settings.
  • reports have recently been collected, but cf-agent has not recently run. “Recently” is defined by the configured run-interval of their cf-agent.

These categories are non-overlapping, meaning a host will only appear in one category at at time even if conditions satisfying multiple categories might be present. This makes reports simpler to read, and makes it easier to detect and fix the root cause of the issue. As one issue is resolve the host might then move to another category. In either situation the data from that host will be from old runs and probably not reflect the current state of that host.


Alerts and Notifications

Create a New Alert
  • From the Dashboard, locate the rectangle with the dotted border.

  • When the cursor is hovering over top, an Add button will appear.

New Alerts

  • Click the button to begin creating the alert.

New Alerts Name

  • Add a unique name for the alert.

  • Each alert has a visual indication of its severity, represented by one of the following colors:

    • Low: Yellow
    • Medium: Orange
    • High: Red

New Alerts Severity

  • From the Severity dropdown box, select one of the three options available.

  • The Select Condition drop down box represents an inventory of existing conditional rules, as well as an option to create a new one

New Alerts Condition

  • When selecting an existing conditional rule, the name of the condition will automatically populate the mandatory condition Name field.

  • When creating a new condition the Name field must be filled in.

New Alerts Condition Type

  • Each alert also has a Condition type:

    • Policy conditions trigger alerts based on CFEngine policy compliance status. They can be set on bundles, promisees, and promises. If nothing is specified, they will trigger alerts for all policy.
    • Inventory conditions trigger alerts for inventory attributes. These attributes correspond to the ones found in inventory reports.
    • Software Updates conditions trigger alerts based on packages available for update in the repository. They can be set either for a specific version or trigger on the latest version available. If neither a package nor a version is specified, they will trigger alerts for any update.
    • Custom SQL conditions trigger alerts based on an SQL query. The SQL query must returns at least one column - hostkey.
  • Alert conditions can be limited to a subset of hosts.

New Alerts Hosts

  • Notifications of alerts may be sent by email or custom action scripts.

New Alerts Notifications

  • Check Email notifications box to activate the field for entering the email address to notify.

  • The Remind me dropdown box provides a selection of intervals to send reminder emails for triggered events.


Custom actions for Alerts

Once you have become familiar with the Alerts widgets, you might see the need to integrate the alerts with an existing system like Nagios, instead of relying on emails for getting notified.

This is where the Custom actions come in. A Custom action is a way to execute a script on the hub whenever an alert is triggered or cleared, as well as when a reminder happens (if set). The script will receive a set of parameters containing the state of the alert, and can do practically anything with this information. Typically, it is used to integrate with other alerting or monitoring systems like PagerDuty or Nagios.

Any scripting language may be used, as long as the hub has an interpreter for it.

Alert parameters

The Custom action script gets called with one parameter: the path to a file with a set of KEY=VALUE lines. Most of the keys are common for all alerts, but some additional keys are defined based on the alert type, as shown below.

Common keys

These keys are present for all alert types.

Key Description
ALERT_ID Unique ID (number).
ALERT_NAME Name, as defined in when creating the alert (string).
ALERT_SEVERITY Severity, as selected when creating the alert (string).
ALERT_LAST_CHECK Last time alert state was checked (Unix epoch timestamp).
ALERT_LAST_EVENT_TIME Last time the alert created an event log entry (Unix epoch timestamp).
ALERT_LAST_STATUS_CHANGE Last time alert changed from triggered to cleared or the other way around (Unix epoch timestamp).
ALERT_STATUS Current status, either 'fail' (triggered) or 'success' (cleared).
ALERT_FAILED_HOST Number of hosts currently triggered on (number).
ALERT_TOTAL_HOST Number of hosts defined for (number).
ALERT_CONDITION_NAME Condition name, as defined when creating the alert (string).
ALERT_CONDITION_DESCRIPTION Condition description, as defined when creating the alert (string).
ALERT_CONDITION_TYPE Type, as selected when creating the alert. Can be 'policy', 'inventory', or 'softwareupdate'.
Policy keys

In addition to the common keys, the following keys are present when ALERT_CONDITION_TYPE='policy'.

Key Description
ALERT_POLICY_CONDITION_FILTERBY Policy object to filter by, as selected when creating the alert. Can be 'bundlename', 'promiser' or 'promisees'.
ALERT_POLICY_CONDITION_FILTERITEMNAME Name of the policy object to filter by, as defined when creating the alert (string).
ALERT_POLICY_CONDITION_PROMISEHANDLE Promise handle to filter by, as defined when creating the alert (string).
ALERT_POLICY_CONDITION_PROMISEOUTCOME Promise outcome to filter by, as selected when creating the alert. Can be either 'KEPT', 'REPAIRED' or 'NOTKEPT'.
Inventory keys

In addition to the common keys, the following keys are present when ALERT_CONDITION_TYPE='inventory'.

Key Description
ALERT_INVENTORY_CONDITION_FILTER_$(ATTRIBUTE_NAME) The name of the attribute as selected when creating the alert is part of the key (expanded), while the value set when creating is the value (e.g. ALERT_INVENTORY_CONDITION_FILTER_ARCHITECTURE='x86_64').
ALERT_INVENTORY_CONDITION_FILTER_$(ATTRIBUTE_NAME)_CONDITION The name of the attribute as selected when creating the alert is part of the key (expanded), while the value is the comparison operator selected. Can be 'ILIKE' (matches), 'NOT ILIKE' (doesn't match), '=' (is), '!=' (is not), '<', '>'.
... There will be pairs of key=value for each attribute name defined in the alert.
Software updates keys

In addition to the common keys, the following keys are present when ALERT_CONDITION_TYPE='softwareupdate'.

Key Description
ALERT_SOFTWARE_UPDATE_CONDITION_PATCHNAME The name of the package, as defined when creating the alert, or empty if undefined (string).
ALERT_SOFTWARE_UPDATE_CONDITION_PATCHARCHITECTURE The architecture of the package, as defined when creating the alert, or empty if undefined (string).
Example parameters: policy bundle alert not kept

Given an alert that triggers on a policy bundle being not kept (failed), the following is example content of the file being provided as an argument to a Custom action script.

ALERT_ID='6'
ALERT_NAME='Web service'
ALERT_SEVERITY='high'
ALERT_LAST_CHECK='0'
ALERT_LAST_EVENT_TIME='0'
ALERT_LAST_STATUS_CHANGE='0'
ALERT_STATUS='fail'
ALERT_FAILED_HOST='49'
ALERT_TOTAL_HOST='275'
ALERT_CONDITION_NAME='Web service'
ALERT_CONDITION_DESCRIPTION='Ensure web service is running and configured correctly.'
ALERT_CONDITION_TYPE='policy'
ALERT_POLICY_CONDITION_FILTERBY='bundlename'
ALERT_POLICY_CONDITION_FILTERITEMNAME='web_service'
ALERT_POLICY_CONDITION_PROMISEOUTCOME='NOTKEPT'

Saving this as a file, e.g. 'alert_parameters_test', can be useful while writing and testing your Custom action script. You could then simply test your Custom action script, e.g. 'cfengine_custom_action_ticketing.py', by running

./cfengine_custom_action_ticketing alert_parameters_test

When you get this to work as expected on the commmand line, you are ready to upload the script to the Mission Portal, as outlined below.

Example script: logging policy alert to syslog

The following Custom action script will log the status and definition of a policy alert to syslog.

#!/bin/bash

source $1

if [ "$ALERT_CONDITION_TYPE" != "policy" ]; then
   logger -i "error: CFEngine Custom action script $0 triggered by non-policy alert type"
   exit 1
fi

logger -i "Policy alert '$ALERT_NAME' $ALERT_STATUS. Now triggered on $ALERT_FAILED_HOST hosts. Defined with $ALERT_POLICY_CONDITION_FILTERBY='$ALERT_POLICY_CONDITION_FILTERITEMNAME', promise handle '$ALERT_POLICY_CONDITION_PROMISEHANDLE' and outcome $ALERT_POLICY_CONDITION_PROMISEOUTCOME"

exit $?

What gets logged to syslog depends on which alert is associated with the script, but an example log-line is as follows:

Sep 26 02:00:53 localhost user[18823]: Policy alert 'Web service' fail. Now triggered on 11 hosts. Defined with bundlename='web_service', promise handle '' and outcome NOTKEPT
Uploading the script to the Mission Portal

Members of the admin role can manage Custom action scripts in the Mission Portal settings.

Custom action scripts overview

A new script can be uploaded, together with a name and description, which will be shown when creating the alerts.

Adding Custom action syslog script

Associating a Custom action with an alert

Alerts can have any number of Custom action scripts as well as an email notification associated with them. This can be configured during alert creation. Note that for security reasons, only members of the admin role may associate alerts with Custom action scripts.

Adding Custom action script to alert

Conversely, several alerts may be associated with the same Custom action script.

When the alert changes state from triggered to cleared, or the other way around, the script will run. The script will also run if the alert remains in triggered state and there are reminders set for the alert notifications.


Enterprise Reporting

CFEngine Enterprise can report on promise outcomes (changes made by cf-agent across your infrastructure), variables, classes, and measurements taken by cf-monitord. Reports cover fine grained policy details, explore all the options by checking out the custom reports section of the Enterprise Reporting module.

Specifically which information allowed to be collected by the hub for reporting is configured by report_data_select bodies. default_data_select_host() defines the data to be collected for a non policy hub and default_data_select_policy_hub() defines the data that should be collected for a policy hub.

Specifying which variables and classes should be collected by an Enterprise Hub is done with a list of regular expressions matching promise meta tags for either inclusion or exclusion. By default we collect variables and classes that are tagged with either report or inventory. Instead of extending this list of tags we recommend that you tag variables and classes with report. If it's desirable to make available in specialized inventory reporting interface then you it should be tagged with inventory and given an additional attribute_name= tag as described in the Custom Inventory Example. By default CFEngine collects information for all promise outcomes. This can be further restricted by specifying promise_handle_include or promise_handle_exclude. Collection of measurements taken by cf-monitord are is controlled using the monitoring_include and monitoring_exclude report_data_select body attributes.

Limitations:

There are various limitations with regard to the size of information that is collected into central reporting. Data that is too large to be reported will be truncated and a verbose level log message will be generated by cf-agent. Some noteable limitations are listed below.

  • string varibales are limited to 1024 bytes
  • lists are limited to 1024 bytes of serialized data
  • data variables are limited to 1024 bytes of serialized data
  • meta tags limited to 1024 bytes of serailized output
  • log messages are truncated to 400 bytes

Please note that these limits may be lower in practice due to internal encoding.

Users can not configure the data is stored to disk. For example, you can not prevent the enterprise agent from logging to promise_log.jsonl.

For information on accessing reported information please see the Reporting UI guide.


Reporting Architecture

The reporting architecture of CFEngine Enterprise uses two software components from the CFEngine Enterprise hub package.

cf-hub

Like all CFEngine components, cf-hub is located in /var/cfengine/bin. It is a daemon process that runs in the background, and is started by cf-agent and from the init scripts.

cf-hub wakes up every 5 minutes and connects to the cf-serverd of each host to download new data.

To collect reports from any host manually, run the following:

$ /var/cfengine/bin/cf-hub -H <host IP>
  • Add -v to run in verbose mode to diagnose connectivity issues and trace the data collected.

  • Delta (differential) reporting, the default mode, collects data that has changed since the last collection. Rebase (full) reports collect everything. You can choose the full collection by adding -q rebase (for backwards comapatibility, also available as -q full).

Apache

REST over HTTP is provided by the Apache http server which also hosts the Mission Portal. The httpd process is started through CFEngine policy and the init scripts and listens on ports 80 and 443 (HTTP and HTTP/S).

Apache is part of the CFEngine Enterprise installation in /var/cfengine/httpd. A local cfapache user is created with privileges to run cf-runagent.


SQL Queries Using the Enterprise API

The CFEngine Enterprise Hub collects information about the environment in a centralized database. Data is collected every 5 minutes from all bootstrapped hosts. This data can be accessed through the Enterprise Reporting API.

Through the API, you can run CFEngine Enterprise reports with SQL queries. The API can create the following report queries:

  • Synchronous query: Issue a query and wait for the table to be sent back with the response.
  • Asynchronous query: A query is issued and an immediate response with an ID is sent so that you can check the query later to download the report.
  • Subscribed query: Specify a query to be run on a schedule and have the result emailed to someone.
Synchronous Queries

Issuing a synchronous query is the most straightforward way of running an SQL query. We simply issue the query and wait for a result to come back.

Request:

curl -k --user admin:admin https://test.cfengine.com/api/query -X POST -d
{
  "query": "SELECT ..."
}

Response:

{
  "meta": {
    "page": 1,
    "count": 1,
    "total": 1,
    "timestamp": 1351003514
  },
  "data": [
    {
      "query": "SELECT ...",
      "header": [
        "Column 1",
        "Column 2"
      ],
      "rowCount": 3,
      "rows": [
      ]
      "cached": false,
      "sortDescending": false
    }
  ]
}
Asynchronous Queries

Because some queries can take some time to compute, you can fire off a query and check the status of it later. This is useful for dumping a lot of data into CSV files for example. The sequence consists of three steps:

  1. Issue the asynchronous query and get a job id.
  2. Check the processing status using the id.
  3. When the query is completed, get a download link using the id.
Issuing the query

Request:

curl -k --user admin:admin https://test.cfengine.com/api/query/async -X POST -d
{
  "query": "SELECT Hosts.HostName, Hosts.IPAddress FROM Hosts JOIN Contexts ON Hosts.Hostkey = Contexts.HostKey WHERE Contexts.ContextName = 'ubuntu'"
}

Response:

{
  "meta": {
    "page": 1,
    "count": 1,
    "total": 1,
    "timestamp": 1351003514
  },
  "data": [
    {
      "id": "32ecb0a73e735477cc9b1ea8641e5552",
      "query": "SELECT ..."
    }
  ]
]
Checking the status

Request:

curl -k --user admin:admin https://test.cfengine.com/api/query/async/:id

Response:

{
  "meta": {
    "page": 1,
    "count": 1,
    "total": 1,
    "timestamp": 1351003514
  },
  "data": [
    {
      "id": "32ecb0a73e735477cc9b1ea8641e5552",
      "percentageComplete": 42,
    ]
}
Getting the completed report

This is the same API call as checking the status. Eventually, the percentageComplete field will reach 100 and a link to the completed report will be available for downloading.

Request:

curl -k --user admin:admin https://test.cfengine.com/api/query/async/:id

Response:

{
  "meta": {
    "page": 1,
    "count": 1,
    "total": 1,
    "timestamp": 1351003514
  },
  "data": [
    {
      "id": "32ecb0a73e735477cc9b1ea8641e5552",
      "percentageComplete": 100,
      "href": "https://test.cfengine.com/api/static/32ecb0a73e735477cc9b1ea8641e5552.csv"
    }
  ]
}
Subscribed Queries

Subscribed queries happen in the context of a user. Any user can create a query on a schedule and have it emailed to someone.

Request:

curl -k --user admin:admin https://test.cfengine.com/api/user/name/
   subscription/query/file-changes-report -X PUT -d
{
  "to": "email@domain.com",
  "query": "SELECT ...",
  "schedule": "Monday.Hr23.Min59",
  "title": "Report title"
  "description": "Text that will be included in email"
  "outputTypes": [ "pdf" ]
}

Response:

204 No Content

Reporting UI

CFEngine collects a large amount of data. To inspect it, you can run and schedule pre-defined reports or use the query builder for your own custom reports. You can save these queries for later use, and schedule reports for specified times.

If you are familiar with SQL syntax, you can input your query into the interface directly. Make sure to take a look at the database schema. Please note: manual entries in the query field at the bottom of the query builder will invalidate all field selections and filters above, and vice-versa.

You can query fewer hosts with the help of filters above the displayed table. These filters are based on the same categorization you can find in the other apps.

You can also filter on the type of promise: user defined, system defined, or all.

See also:

Query Builder

Users not familiar with SQL syntax can easily create their own custom reports in this interface.

  • Tables - Select the data tables you want include in your report first.
  • Fields - Define your table columns based on your selection above.
  • Filters - Filter your results. Remember that unless you filter, you may be querying large data sets, so think about what you absolutely need in your report.
  • Group - Group your results. May be expensive with large data sets.
  • Sort - Sort your results. May be expensive with large data sets.
  • Limit - Limit the number of entries in your report. This is a recommended practice for testing your query, and even in production it may be helpful if you don't need to see every entry.
  • Show me the query - View and edit the SQL query directly. Please note, that editing the query directly here will invalidate your choices in the query builder interface, and changing your selections there will override your SQL query.
Ensure the report collection is working
  • The reporting bundle must be called from promises.cf. For example, the following defines the attribute Role which is set to database_server. You need to add it to the top-level bundlesequence in promises.cf or in a bundle that it calls.

    bundle agent myreport
    {
      vars:
          "myrole"
          string => "database_server",
          meta => { "inventory", "attribute_name=Role" };
    }
    
  • note the meta tag inventory

  • The hub must be able to collect the reports from the client. TCP port 5308 must be open and, because 3.6 uses TLS, should not be proxied or otherwise intercepted. Note that bootstrapping and other standalone client operations go from the client to the server, so the ability to bootstrap and copy policies from the server doesn't necessarily mean the reverse connection will work.

  • Ensure that variables and classes tagged as inventory or report are not filtered by controls/cf_serverd.cf in your infrastructure. The standard configuration from the stock CFEngine packages allows them and should work.

Note: The CFEngine report collection model accounts for long periods of time when the hub is unable to collect data from remote agents. This model preserves data recorded until it can be collected. Data (promise outcomes, etc ...) recorded by the agent during normal agent runs is stored locally until it is collected from by the cf-hub process. At the time of collection the local data stored on the client is cleaned up and only the last hours worth of data remains client. It is important to understand that the time between hub collection and number of clients that are unable to be collected from grows the amount of data to transfer and store in the central database also grows. A large number of clinets that have not been collected from that become available at once can cause increased load on the hub collector and affect its performance until it has been able to collect from all hosts.

Define a New Single Table Report
  1. In Mission Portal select the Report application icon on the left hand side of the screen.
  2. This will bring you to the Report builder screen.
  3. The default for what hosts to report on is All hosts. The hosts can be filtered under the Filters section at the top of the page.
  4. For this tutorial leave it as All hosts.
  5. Set which tables' data we want reports for.
  6. For this tutorial select Hosts.
  7. Select the columns from the Hosts table for the report.
  8. For this tutorial click the Select all link below the column lables.
  9. Leave Filters, Sort, and Limit at the default settings.
  10. Click the orange Run button in the bottom right hand corner.
Check Report Results
  1. The report generated will show each of the selected columns across the report table's header row.
  2. In this tutorial the columns being reported back should be: Host key, Last report time, Host name, IP address, First report-time.
  3. Each row will contain the information for an individual data record, in this case one row for each host.
  4. Some of the cells in the report may provide links to drill down into more detailed information (e.g. Host name will provide a link to a Host information page).
  5. It is possible to also export the report to a file.
  6. Click the orange Export button.
  7. You will then see a Report Download dialog.
  8. Report type can be either csv or pdf format.
  9. Leave other fields at the default values.
  10. If the server's mail configuration is working properly, it is possible to email the report by checking the Send in email box.
  11. Click OK to download or email the csv or pdf version of the report.
  12. Once the report is generated it will be available for download or will be emailed.
Inventory Management

Inventory allows you to define the set of hosts to report on.

The main Inventory screen shows the current set of hosts, together with relevant information such as operating system type, kernel and memory size.

Inventory Management

To begin filtering, one would first select the Filters drop down, and then select an attribute to filter on (e.g. OS type = linux)

Inventory Management

After applying the filter, it may be convenient to add the attribute as one of the table columns.

Inventory Management

Changing the filter, or adding additional attributes for filtering, is just as easy.

Inventory Management

We can see here that there are no Windows machines bootstrapped to this hub.

Inventory Management


Monitoring

Monitoring allows you to get an overview of your hosts over time.

Monitoring

If multiple hosts are selected in the menu on the left, then you can select one of three key measurements that is then displayed for all hosts:

  • load average
  • Disk free (in %)
  • CPU(ALL) (in %)

You can reduce the number of graphs by selecting a sub-set of hosts from the menu on the left. If only a single host is selected, then a number of graphs for various measurements will be displayed for this host. Which exact measurements are reported depends on how cf-monitord is configured and extended via measurements promises.

Clicking on an individual graph allows to select different time spans for which monitoring data will be displayed.

If you don't see any data, make sure that:


Enterprise API

The CFEngine Enterprise API allows HTTP clients to interact with the CFEngine Enterprise Hub. Typically this is also the policy server.

Enterprise API Overview

The Enterprise API is a REST API, but a central part of interacting with the API uses SQL. With the simplicity of REST and the flexibility of SQL, users can craft custom reports about systems of arbitrary scale, mining a wealth of data residing on globally distributed CFEngine Database Servers.

See also the Enterprise API Examples and the Enterprise API Reference.


Best Practices

Version Control and Configuration Policy

CFEngine users version their policies. It's a reasonable, easy thing to do: you just put /var/cfengine/masterfiles under version control and... you're done?

What do you think? How do you version your own infrastructure?

Problem statement

It turns out everyone likes convenience and writing the versioning machinery is hard. So for CFEngine Enterprise 3.6 we set out to provide version control integration with Git out of the box, disabled by default. This allows users to use branches for separate hubs (which enables a policy release pipeline).

Release pipeline

A build and release pipeline is how software is typically delivered to production through testing stages. In the case of CFEngine, policies are the software. Users have at least two stages, development and production, but typically the sequence has more stages including various forms of testing/QA and pre-production.

How to enable it

To enable masterfiles versioning, you have to plan a little bit. These are the steps:

Configure your repository

Use a remote Git repository accessible via the git or https protocol populated with the contents of masterfiles.

Using a remote repository

To use a remote repository, you must enter its address, login credentials and the branch you want to use in the Mission Portal VCS integration panel. To access it, click on "Settings" in the top-left menu of the Mission Portal screen, and then select "Version control repository". This screen by default contains the settings for using the built-in local repository.

Settings menu

VCS settings screen

Make sure your current masterfiles are in the chosen repository

This is critical. When you start auto-deploying policy, you will overwrite your current /var/cfengine/masterfiles. So take the current contents thereof and make sure they are in the Git repository you chose in the previous step.

For example, if you create a new repository in GitHub by following the instructions from https://help.github.com/articles/create-a-repo, you can add the contents of masterfiles to it with the following commands (assuming you are already in your local repository checkout):

cp -r /var/cfengine/masterfiles/* .
git add *
git commit -m 'Initial masterfiles check in'
git push origin master
Enable VCS deployments in the versioned update.cf

In the file update_def.cf under a version-specific subdirectory of controls/ in your version-controlled masterfiles, change

#"cfengine_internal_masterfiles_update" expression => "enterprise.!(cfengine_3_4|cfengine_3_5)";
"cfengine_internal_masterfiles_update" expression => "!any";

to

"cfengine_internal_masterfiles_update" expression => "enterprise.!(cfengine_3_4|cfengine_3_5)";
#"cfengine_internal_masterfiles_update" expression => "!any";

This is simply commenting out one line and uncommenting another.

Remember that you need to commit and push these changes to the repository you chose in the previous step, so that they are picked up when you deploy from the git repository. In your checked out masterfiles git repository, these commands should normally do the trick:

git add update.cf
git commit -m 'Enabled auto-policy updates'
git push origin master

Now you need to do the first-time deployment, whereupon this new update.cf and the rest of your versioned masterfiles will overwrite /var/cfengine/masterfiles. We made that easy too, using standard CFEngine tools. Exit the cfapache account and run the following command as root on your hub:

cf-agent -Dcfengine_internal_masterfiles_update -f update.cf

Easy, right? You're done, from now on every time update.cf is run (by default, every 5 minutes) it will check out the repository and branch you configured in the Mission Portal VCS integration panel.

Please note all the work is done as user cfapache except the very last step of writing into /var/cfengine/masterfiles.

How it works

The code is fairly simple and can even be modified if you have special requirements (e.g. Subversion integration). But out of the box there are three important components. All the scripts below are stored under /var/cfengine/httpd/htdocs/api/dc-scripts/ in your CFEngine Enterprise hub.

common.sh

The script common.sh is loaded by the deployment script and does two things. First, it redirects all output to /var/cfengine/outputs/dc-scripts.log. So if you have problems, check there first.

Second, the script sources /opt/cfengine/dc-scripts/params.sh where the essential parameters like repository address and branch live. That file is written out by the Mission Portal VCS integration panel, so it's the connection between the Mission Portal GUI and the underlying scripts.

masterfiles-stage.sh

This script is called to deploy the masterfiles from VCS to /var/cfengine/masterfiles. It's fairly complicated and does not depend on CFEngine itself by design; for instance it uses rsync to deploy the policies. You may want to review and even modify it, for example choosing to reject deployments that are too different from the current version (which could indicate a catastrophic failure or misconfiguration).

This script also validates the policies using cf-promises -T. That command looks in a directory and ensures that promises.cf in the directory is valid. If it's not, an error will go in the log file and the script exits.

NOTE this means that clients will never get invalid policies according to the hub.

Policy changes

If you want to make manual changes to your policies, simply make those changes in a checkout of your masterfiles repository, commit and push the changes. The next time update.cf runs, your changes will be checked out and in minutes distributed through your entire infrastructure.

Benefits

To conclude, let's summmarize the benefits of versioning your masterfiles using the built-in facilities in CFEngine Enterprise.

  • easy to use compared to home-grown VCS integration
  • supports Git out of the box and, with some work, can support others like Subversion, Mercurial, and CVS.
  • tested, reliable, and built-in
  • supports any repository and branch per hub
  • your policies are validated before deployment
  • integration happens through shell scripts and update.cf, not C code or special policies
Scalability

When running CFEngine Enterprise in a large-scale IT environment with many thousands of hosts, certain issues arise that require different approaches compared with smaller installations.

With CFEngine 3.6, significant testing was performed to identify the issues surrounding scalability and to determine best practices in large-scale installations of CFEngine.

Moving PostgreSQL to Separate Hard Drive

Moving the PostgreSQL database to another physical hard drive from the other CFEngine components can improve the stability of large-scale installations, particularly when using a solid-state drive (SSD) for hosting the PostgreSQL database.

The data access involves a huge number of random IO operations, with small chunks of data. SSD may give the best performance because it is designed for these types of scenarios.

Important: The PostgreSQL data files are in /var/cfengine/state/pg/ by default. Before moving the mount point, please make sure that all CFEngine processes (including PostgreSQL) are stopped and the existing data files are copied to the new location.

Setting the splaytime

The splaytime tells CFEngine hosts the base interval over which they will communicate with the policy server, which they then use to "splay" or hash their own runtimes.

Thus when splaytime is set to 4, 1000 hosts will hash their run attempts evenly over 4 minutes, and each minute will see about 250 hosts make a run attempt. In effect, the hosts will attempt to communicate with the policy server and run their own policies in predictable "waves." This limits the number of concurrent connections and overall system load at any given moment.