Enterprise report collection
What are reports?
Reports are the records that the components ( cf-agent
, cf-monitord
,
cf-serverd
... ) record about their knowledge of the system state. Each
component may log to various data sources within $(sys.statedir)
.
How does CFEngine Enterprise collect reports?
cf-hub
makes connections from the hub to remote agents currently registered in
the lastseen database (viewable with cf-key -s
)
on body hub control port
(5308 by default). The hub
tries to collect from up to the LICENSED number of hosts for each collection
round as identified by hub_schedule
as defined
in body hub control
.
- See Also:
hostsseen()
,hostswithclass()
How often does cf-hub re-check the LICENSE
cf-hub
re-checks the license when it is started and once every 5 minutes after
that.
Which hosts are being report-collected?
cf-hub
gets a list of hosts to collect from lastseen
database
(viewable with cf-key -s
).
NOTE: this database is periodically cleaned from entries older than one week old.
This cleanup is tweakable using lastseenexpireafter
setting.
However we don't recommend tweaking this setting, as older hosts are
practically dead, and may affect report collection performance (via
timeouts) and license-counting.
How does the license count affect report collection?
In each collection round, cf-hub
will collect reports from up to
LICENSED number of hosts. It is unspecified which hosts are the ones
skipped, in case the total number of hosts listed in lastseen
database
are over the LICENSED number.
When is a hub behaving as over-licensed ?
When the number of hosts in the lastseen
database (viewable with
cf-key -s
) is greater than the number of LICENSED hosts for this hub.
How are agents not running determined?
Hosts who's last agent execution status is "FAIL" will show up under "Agents not running". A hosts last agent execution status is set to "FAIL" when the hub notices that there are no promise results within 3x of the expected agent run interval. The agents average run interval is computed by a geometric average based on the 4 most recent agent executions.
You can inspect hosts last execution time, execution status (from the hubs perspective), and average run interval using the following SQL.
SELECT Hosts.HostName AS "Host name",
AgentStatus.LastAgentLocalExecutionTimeStamp AS "Last agent local execution
time", cast(AgentStatus.AgentExecutionInterval AS integer) AS "Agent execution
interval", AgentStatus.LastAgentExecutionStatus AS "Last agent execution status"
FROM AgentStatus INNER JOIN Hosts ON Hosts.HostKey = AgentStatus.HostKey
This can be queried over the API most easily by placing the query into a json
file. And then using the query
API.
agent_execution_time_interval_status.query.json
:
{
"query": "SELECT Hosts.HostName, AgentStatus.LastAgentLocalExecutionTimeStamp, cast(AgentStatus.AgentExecutionInterval AS integer), AgentStatus.LastAgentExecutionStatus FROM AgentStatus INNER JOIN Hosts ON Hosts.HostKey = AgentStatus.HostKey"
}
$ curl -s -u admin:admin http://hub/api/query -X POST -d @agent_execution_time_interval_status.query.json | jq ".data[0].rows"
[
[
"hub",
"2016-07-25 16:53:23+00",
"296",
"OK"
],
[
"host001",
"2016-07-25 16:06:50+00",
"305",
"FAIL"
]
]
See Also: Enterprise API Reference
, Enterprise API Examples
How are hosts not reporting determined?
Hosts that have not been collected from within blueHostHorizon
seconds will
show up under "Hosts not reporting".
blueHostHorizon
defaults to 900 seconds (15 minutes). You can inspect the
current value of blueHostHorizon
from Mission Portal or via the API:
$ curl -s -u admin:admin http://hub/api/settings/ | jq ".data[0].blueHostHorizon"
900
Note: It's called "blueHostHorizon" because older versions of Mission Portal would turn these hosts to a blue color as an indication of "hypoxia" (lack of oxygen, where oxygen is access to latest policy) to indicate a health issue.
See Also: Enterprise API Reference
, Enterprise API Examples
, Enterprise Settings
Are there supposed to be so many cf-consumer processes?
Yes, cf-consumer
will spawn 25 threads for report collection processing.
Which hosts are pending trust revocation?
When a host is removed using the delete API its key is placed in a queue for
trust revocation. To see which hosts are pending key removal use the following
query against the cfsettings
database.
SELECT HostKey FROM KeysPendingForDeletion;
How to troubleshoot report collection?
The following steps can be used to help diagnose and potentially restore reporting for hosts experiencing issues.
Perform manual delta collection for a single host
Performing back to back delta collections and comparing the data received can help to expose so called patching issues. If the same amount of data is collected twice a rebase may resolve it.
[root@hub ~]# cf-hub -q delta -H 192.168.33.2 -v
verbose: ----------------------------------------------------------------
verbose: Initialization preamble
verbose: ----------------------------------------------------------------
# <snipped for brevity>
verbose: Connecting to host 192.168.33.2, port 5308 as address 192.168.33.2
verbose: Waiting to connect...
verbose: Setting socket timeout to 10 seconds.
verbose: Connected to host 192.168.33.2 address 192.168.33.2 port 5308 (socket descriptor 4)
verbose: TLS version negotiated: TLSv1.2; Cipher: AES256-GCM-SHA384,TLSv1/SSLv3
verbose: TLS session established, checking trust...
verbose: Received public key compares equal to the one we have stored
verbose: Server is TRUSTED, received key 'SHA=e77d408e9802e2c549417d5e3379c43050d2ad5928a198855dbb7e9c8af9a6f1' MATCHES stored one.
verbose: Key digest for address '192.168.33.2' is SHA=e77d408e9802e2c549417d5e3379c43050d2ad5928a198855dbb7e9c8af9a6f1
verbose: Will request from host 192.168.33.2 (digest = SHA=e77d408e9802e2c549417d5e3379c43050d2ad5928a198855dbb7e9c8af9a6f1) data later than timestamp 1481901790
verbose: Successfully opened extension plugin 'cfengine-report-collect.so' from '/var/cfengine/lib/cfengine-report-collect.so'
verbose: Successfully loaded extension plugin 'cfengine-report-collect.so'
verbose: Sending query at Fri Dec 16 15:24:23 2016
verbose: h>s QUERY delta 1481901790 1481901863
verbose: Sending query at Fri Dec 16 15:24:23 2016
verbose: Received reply of 5050 bytes at Fri Dec 16 15:24:23 2016 -> Xfer time 0 seconds (processing time 0 seconds)
verbose: Processing report: MOM (items: 44)
verbose: Processing report: MOY (items: 48)
verbose: Processing report: MOH (items: 22)
verbose: Processing report: EXS (items: 1)
verbose: Received 5 kb of report data with 115 individual items
verbose: Connection to 192.168.33.2 is closed
Perform manual rebase collection for a single host
A rebase
causes the hub to throw away all reports since the last collection
and collect only the output from the most recent run.
[root@hub ~]# cf-hub -q rebase -H 192.168.33.2 -v
verbose: ----------------------------------------------------------------
verbose: Initialization preamble
verbose: ----------------------------------------------------------------
# <snipped for brevity>
verbose: Connecting to host 192.168.33.2, port 5308 as address 192.168.33.2
verbose: Waiting to connect...
verbose: Setting socket timeout to 10 seconds.
verbose: Connected to host 192.168.33.2 address 192.168.33.2 port 5308 (socket descriptor 4)
verbose: TLS version negotiated: TLSv1.2; Cipher: AES256-GCM-SHA384,TLSv1/SSLv3
verbose: TLS session established, checking trust...
verbose: Received public key compares equal to the one we have stored
verbose: Server is TRUSTED, received key 'SHA=e77d408e9802e2c549417d5e3379c43050d2ad5928a198855dbb7e9c8af9a6f1' MATCHES stored one.
verbose: Key digest for address '192.168.33.2' is SHA=e77d408e9802e2c549417d5e3379c43050d2ad5928a198855dbb7e9c8af9a6f1
verbose: Successfully opened extension plugin 'cfengine-report-collect.so' from '/var/cfengine/lib/cfengine-report-collect.so'
verbose: Successfully loaded extension plugin 'cfengine-report-collect.so'
verbose: Sending query at Fri Dec 16 15:35:10 2016
verbose: h>s QUERY rebase 0 1481902510
verbose: Sending query at Fri Dec 16 15:35:10 2016
verbose: Received reply of 128157 bytes at Fri Dec 16 15:35:10 2016 -> Xfer time 0 seconds (processing time 0 seconds)
verbose: Processing report: CLD (items: 46)
verbose: Processing report: VAD (items: 52)
verbose: Processing report: LSD (items: 13)
verbose: Processing report: SDI (items: 327)
verbose: Processing report: SPD (items: 143)
verbose: Processing report: ELD (items: 205)
verbose: ts #0 > 1481902510
verbose: Received 125 kb of report data with 786 individual items
verbose: Connection to 192.168.33.2 is closed
Note: The Enterprise hub automatically schedules rebase queries if it has
been unable to collect from a given candidate for client_history_timeout
hours.
If a manual rebase collection does not restore reporting functionality for a host continue on to restarting the report collection components.
Restart report collection components
Sometimes it is necessary to restart the report collection subsystem in order to
re-synchronize the caching layer with the database. To restart the report
collection subsystem simply kill cf-hub
, cf-consumer
, redis-server
, and
run the update policy.
For systemd hosts this can be accomplished by simply restarting the cf-hub
service. The related component restarts are automatically handled via the unit
dependencies:
[root@hub ~]# systemctl restart cf-hub
For non-systemd hosts:
[root@hub ~]# pkill cf-consumer
[root@hub ~]# pkill cf-hub
[root@hub ~]# pkill redis-server
[root@hub ~]# cf-agent -KIf update.cf
info: Executing 'no timeout' ... '/var/cfengine/bin/redis-server /var/cfengine/config/redis.conf'
info: Command related to promiser '/var/cfengine/bin/redis-server /var/cfengine/config/redis.conf' returned code defined as promise kept 0
info: Completed execution of '/var/cfengine/bin/redis-server /var/cfengine/config/redis.conf'
info: Executing 'no timeout' ... '/var/cfengine/bin/cf-consumer'
info: Command related to promiser '/var/cfengine/bin/cf-consumer' returned code defined as promise kept 0
info: Completed execution of '/var/cfengine/bin/cf-consumer'
info: Executing 'no timeout' ... '"/var/cfengine/bin/cf-hub"'
info: Command related to promiser '"/var/cfengine/bin/cf-hub"' returned code defined as promise kept 0
info: Completed execution of '"/var/cfengine/bin/cf-hub"'