Enterprise report collection

Table of Contents

How does CFEngine Enterprise collect reports?

cf-hub makes connections from the hub to remote agents currently registered in the lastseen database (viewable with cf-key -s) on [body hub control port]body hub control port. The hub tries to collect from up to the licensed number of hosts for each collection round as identified by hub_schedule as defined in body hub control.

  • Note: No ordering is specified, so if the number of entries in the lastseen database is greater than the number of licensed hosts it is not possible to determine which hosts will be collected from and which hosts will be skipped.

  • See Also: hostsseen(), hostswithclass()

How are agents not running determined?

Hosts who's last agent execution status is "FAIL" will show up under "Agents not running". A hosts last agent execution status is set to "FAIL" when the hub notices that there are no promise results within 3x of the expected agent run interval. The agents average run interval is computed by a geometric average based on the 4 most recent agent executions.

Agents not running

You can inspect hosts last execution time, execution status (from the hubs perspective), and average run interval using the following SQL.

SELECT Hosts.HostName AS "Host name",
AgentStatus.LastAgentLocalExecutionTimeStamp AS "Last agent local execution
time", cast(AgentStatus.AgentExecutionInterval AS integer) AS "Agent execution
interval", AgentStatus.LastAgentExecutionStatus AS "Last agent execution status"
FROM AgentStatus INNER JOIN Hosts ON Hosts.HostKey = AgentStatus.HostKey

This can be queried over the API most easily by placing the query into a json file. And then using the query API.

agent_execution_time_interval_status.query.json:

{
  "query": "SELECT Hosts.HostName, AgentStatus.LastAgentLocalExecutionTimeStamp, cast(AgentStatus.AgentExecutionInterval AS integer), AgentStatus.LastAgentExecutionStatus FROM AgentStatus INNER JOIN Hosts ON Hosts.HostKey = AgentStatus.HostKey"
}
$ curl -s -u admin:admin http://hub/api/query -X POST -d @agent_execution_time_interval_status.query.json | jq ".data[0].rows"
[
  [
    "hub",
    "2016-07-25 16:53:23+00",
    "296",
    "OK"
  ],
  [
    "host001",
    "2016-07-25 16:06:50+00",
    "305",
    "FAIL"
  ]
]

See Also: Enterprise API Reference, Enterprise API Examples

How are hosts not reporting determined?

Hosts that have not been collected from within blueHostHorizon seconds will show up under "Hosts not reporting".

Hosts not reporting

blueHostHorizon defaults to 900 seconds (15 minutes). You can inspect the current value of blueHostHorizon from Mission Portal or via the API:

$ curl -s -u admin:admin http://hub/api/settings/ | jq ".data[0].blueHostHorizon"
900

See Also: Enterprise API Reference, Enterprise API Examples, Enterprise Settings

Troubleshooting report collection

The following steps can be used to help diagnose and potentially restore reporting for hosts experiencing issues.

Perform manual delta collection for a single host

Performing back to back delta collections and comparing the data received can help to expose so called patching issues. If the same amount of data is collected twice a rebase may resolve it.

[root@hub ~]# cf-hub -q delta -H 192.168.33.2 -v
 verbose: ----------------------------------------------------------------
 verbose:  Initialization preamble 
 verbose: ----------------------------------------------------------------
 # <snipped for brevity>
 verbose: Connecting to host 192.168.33.2, port 5308 as address 192.168.33.2
 verbose: Waiting to connect...
 verbose: Setting socket timeout to 10 seconds.
 verbose: Connected to host 192.168.33.2 address 192.168.33.2 port 5308 (socket descriptor 4)
 verbose: TLS version negotiated:  TLSv1.2; Cipher: AES256-GCM-SHA384,TLSv1/SSLv3
 verbose: TLS session established, checking trust...
 verbose: Received public key compares equal to the one we have stored
 verbose: Server is TRUSTED, received key 'SHA=e77d408e9802e2c549417d5e3379c43050d2ad5928a198855dbb7e9c8af9a6f1' MATCHES stored one.
 verbose: Key digest for address '192.168.33.2' is SHA=e77d408e9802e2c549417d5e3379c43050d2ad5928a198855dbb7e9c8af9a6f1
 verbose: Will request from host 192.168.33.2 (digest = SHA=e77d408e9802e2c549417d5e3379c43050d2ad5928a198855dbb7e9c8af9a6f1) data later than timestamp 1481901790
 verbose: Successfully opened extension plugin 'cfengine-report-collect.so' from '/var/cfengine/lib/cfengine-report-collect.so'
 verbose: Successfully loaded extension plugin 'cfengine-report-collect.so'
 verbose: Sending query at Fri Dec 16 15:24:23 2016
 verbose: h>s QUERY delta 1481901790 1481901863
 verbose: Sending query at Fri Dec 16 15:24:23 2016
 verbose: Received reply of 5050 bytes at Fri Dec 16 15:24:23 2016 -> Xfer time 0 seconds (processing time 0 seconds)
 verbose: Processing report: MOM (items: 44)
 verbose: Processing report: MOY (items: 48)
 verbose: Processing report: MOH (items: 22)
 verbose: Processing report: EXS (items: 1)
 verbose: Received 5 kb of report data with 115 individual items
 verbose: Connection to 192.168.33.2 is closed

Perform manual rebase collection for a single host

A rebase causes the hub to throw away all reports since the last collection and collect only the output from the most recent run.

[root@hub ~]# cf-hub -q rebase -H 192.168.33.2 -v
 verbose: ----------------------------------------------------------------
 verbose:  Initialization preamble 
 verbose: ----------------------------------------------------------------
 # <snipped for brevity>
 verbose: Connecting to host 192.168.33.2, port 5308 as address 192.168.33.2
 verbose: Waiting to connect...
 verbose: Setting socket timeout to 10 seconds.
 verbose: Connected to host 192.168.33.2 address 192.168.33.2 port 5308 (socket descriptor 4)
 verbose: TLS version negotiated:  TLSv1.2; Cipher: AES256-GCM-SHA384,TLSv1/SSLv3
 verbose: TLS session established, checking trust...
 verbose: Received public key compares equal to the one we have stored
 verbose: Server is TRUSTED, received key 'SHA=e77d408e9802e2c549417d5e3379c43050d2ad5928a198855dbb7e9c8af9a6f1' MATCHES stored one.
 verbose: Key digest for address '192.168.33.2' is SHA=e77d408e9802e2c549417d5e3379c43050d2ad5928a198855dbb7e9c8af9a6f1
 verbose: Successfully opened extension plugin 'cfengine-report-collect.so' from '/var/cfengine/lib/cfengine-report-collect.so'
 verbose: Successfully loaded extension plugin 'cfengine-report-collect.so'
 verbose: Sending query at Fri Dec 16 15:35:10 2016
 verbose: h>s QUERY rebase 0 1481902510
 verbose: Sending query at Fri Dec 16 15:35:10 2016
 verbose: Received reply of 128157 bytes at Fri Dec 16 15:35:10 2016 -> Xfer time 0 seconds (processing time 0 seconds)
 verbose: Processing report: CLD (items: 46)
 verbose: Processing report: VAD (items: 52)
 verbose: Processing report: LSD (items: 13)
 verbose: Processing report: SDI (items: 327)
 verbose: Processing report: SPD (items: 143)
 verbose: Processing report: ELD (items: 205)
 verbose:   ts #0 > 1481902510
 verbose: Received 125 kb of report data with 786 individual items
 verbose: Connection to 192.168.33.2 is closed

Note: The Enterprise hub automatically schedules rebase queries if it has been unable to collect from a given candidate for client_history_timeout hours.

If a manual rebase collection does not restore reporting functionality for a host continue on to restarting the report collection components.

Restart report collection components

Sometimes it is necessary to restart the report collection subsystem in order to re-synchronize the caching layer with the database. To restart the report collection subsystem simply kill cf-hub, cf-consumer, redis-server, and run the update policy.

For systemd hosts this can be accomplished by simply restarting the cf-hub service. The related component restarts are automatically handled via the unit dependencies:

[root@hub ~]# systemctl restart cf-hub

For non-systemd hosts:

[root@hub ~]# pkill cf-consumer
[root@hub ~]# pkill cf-hub
[root@hub ~]# pkill redis-server
[root@hub ~]# cf-agent -KIf update.cf
    info: Executing 'no timeout' ... '/var/cfengine/bin/redis-server /var/cfengine/config/redis.conf'
    info: Command related to promiser '/var/cfengine/bin/redis-server /var/cfengine/config/redis.conf' returned code defined as promise kept 0
    info: Completed execution of '/var/cfengine/bin/redis-server /var/cfengine/config/redis.conf'
    info: Executing 'no timeout' ... '/var/cfengine/bin/cf-consumer'
    info: Command related to promiser '/var/cfengine/bin/cf-consumer' returned code defined as promise kept 0
    info: Completed execution of '/var/cfengine/bin/cf-consumer'
    info: Executing 'no timeout' ... '"/var/cfengine/bin/cf-hub"'
    info: Command related to promiser '"/var/cfengine/bin/cf-hub"' returned code defined as promise kept 0
    info: Completed execution of '"/var/cfengine/bin/cf-hub"'