bischeck – quick start | bischeck – dynamic and adaptive monitoring
Warning: Use of undefined constant devfmtcss - assumed 'devfmtcss' (this will throw an Error in a future version of PHP) in /var/www/vhost/wwwbischecksec/wordpress/wp-content/plugins/devformatter/devinterface.php on line 561

bischeck – quick start

This is a short tutorial how to quickly start using bischeck. The tutorial is based on Bischeck 0.4.3 and use the functionality to execute normal Nagios check commands. In this example we will use the check_tcp to monitor the response time of the sshd server, port 22, running on the localhost, but it can easily be changed to some other port of you interest. The example require you to work with 3 of the Bischeck configuration files located in the etc directory of the location of where  Bischeck is installed, by default /opt/socbox/addons/bischeck. The configuration files are:

  • bischeck.xml – the main configuration file describing connection, scheduling and the execution statement to retrieve monitored data.
  • 24thresholds.xml – the threshold configuration that describe the 24 hour profiles to be used to evaluate the threshold of the measured data.
  • servers.xml – the configuration how to connect to your Nagios server.

In the example we will simply monitoring the response time on port 22 of the localhost. We will setup thresholds that will calculate the threshold differently depending on the date.  

Lets start with the bischeck.xml:

 XML | 
 
 copy code |
?

01
<?xml version='1.0' encoding='UTF-8'?>
02
<bischeck>
03
  <host>
04
  <name>localhost</name>
05
  <desc>My localhost</desc>
06
    <service>
07
      <name>sshport</name>
08
      <desc>ssh port service</desc>
09
      <schedule>5S</schedule> 
10
      <url>shell://localhost</url>
11
      <serviceitem>
12
        <name>response</name>
13
        <desc>Response time for tcp check</desc>
14
        <execstatement>{"check":"/usr/lib/nagios/plugins/check_tcp -H localhost -p 22","label":"time"}</execstatement>
15
        <thresholdclass>Twenty4HourThreshold</thresholdclass>
16
        <serviceitemclass>CheckCommandServiceItem</serviceitemclass>
17
      </serviceitem>
18
    </service>
19
  </host>
20
</bischeck>

  • Line 04 is the name of the host that must have a corresponding host entry in the Nagios configuration.
  • Line 07 is the name of the service that must have a corresponding service entry in the Nagios configuration.
  • Line 09 define how often the service will be executed. In our case every 5 second. Multiple schedules can be defined and you can have both cron and interval definitions. 
  • Line 10 is the definition of the connection url. In this case the url describe a local shell execution.
  • Line 12 is the name of the service item. A service item define what to execute to retrieve data. Every service can have multiple service items as long as they use the same type of url connection defined on the service level. If multiple service items are defined, the state of the service reported to Nagios will be based on the service item with the highest severity.
  • Line 14 is the execute statement. The format is depending of the serviceitemclass specified at line 16. In this case the CheckCommandServiceItem class enable you to use any Nagios plugin that output performance data. The path to the check command depend on your Nagios installation. The format is a json object, where “check” is the command to execute and “label” is the name of the key in the performance data you like to retrieve. 
  • Line 15 is the name of the threshold class to be used to evaluate the threshold for the measured data. If this is not define no threshold calculation is done.
So now the bischeck.xml configuration is done. Now its time to define the threshold. In this example we will only define one profile that will be valid for any day but as you now multiple profiles can be defined for any combination of host, service and service item.

 XML | 
 
 copy code |
?

01
<?xml version='1.0' encoding='UTF-8'?>
02
<twenty4threshold xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
03
 
04
  <servicedef>
05
    <hostname>localhost</hostname>
06
    <servicename>sshport</servicename>
07
    <serviceitemname>response</serviceitemname>
08
 
09
    <period>
10
    <!-- Period use when the day is Monday and the first day of month -->
11
      <months>
12
        <dayofmonth>1</dayofmonth>
13
      </months>
14
      <weeks>
15
        <dayofweek>2</dayofweek>
16
      </weeks>
17
 
18
      <calcmethod><</calcmethod>
19
      <warning>10</warning>
20
      <critical>30</critical>
21
      <hoursIDREF>1</hoursIDREF>
22
    </period>
23
 
24
    <period>
25
      <!-- Period use when the day is Wednesday and Friday --> 
26
 
27
      <weeks>
28
        <dayofweek>4</dayofweek>
29
      </weeks>
30
 
31
      <weeks>
32
        <dayofweek>6</dayofweek>
33
      </weeks>
34
 
35
      <calcmethod><</calcmethod>
36
      <warning>10</warning>
37
      <critical>30</critical>
38
      <hoursIDREF>2</hoursIDREF>
39
    </period>
40
 
41
    <!-- The default period -->
42
    <period>
43
      <calcmethod><</calcmethod>
44
      <warning>5</warning>
45
      <critical>20</critical>
46
      <hoursIDREF>1</hoursIDREF>
47
    </period>
48
 
49
  </servicedef>
50
 
51
  <hours hoursID="1"> 
52
    <!-- From 08:00 to 18:00 --> 
53
    <hourinterval>
54
      <from>08:00</from>
55
      <to>18:00</to>
56
      <!-- the average of data point 1 and 2, remember that data point 0 is always the last collected data point --> 
57
      <threshold>avg(localhost-sshport-response[1],localhost-sshport-response[2])</threshold>
58
    </hourinterval>
59
  </hours>
60
 
61
  <hours hoursID="2"> 
62
    <!-- From 08:00 to 13:00 -->
63
    <hourinterval>
64
      <from>08:00</from>
65
      <to>13:00</to>
66
      <!-- the sum of data point 1 and 2 and divided by 2 - this is also average but its an example -->
67
      <threshold>sum(localhost-sshport-response[1],localhost-sshport-response[2])/2</threshold>
68
    </hourinterval>
69
 
70
    <!-- From 15:00 to 17:00 -->
71
    <hourinterval>
72
      <from>15:00</from>
73
      <to>17:00</to>
74
      <!-- the average of the last 10 data points --> 
75
      <threshold>avg(localhost-sshport-response[0:9])</threshold>
76
    </hourinterval>
77
 
78
  </hours>
79
 
80
  <!-- Swedish holidays -->
81
  <holiday year="2013">
82
    <dayofyear>0101</dayofyear>
83
    <dayofyear>0106</dayofyear>
84
    <dayofyear>0329</dayofyear>
85
    <dayofyear>0330</dayofyear>
86
    <dayofyear>0401</dayofyear> 
87
    <dayofyear>0501</dayofyear>
88
    <dayofyear>0509</dayofyear>
89
    <dayofyear>0606</dayofyear>
90
    <dayofyear>1225</dayofyear>
91
    <dayofyear>1226</dayofyear>
92
  </holiday>
93
</twenty4threshold>

  • Line 04 start with the servicedef tag that define the beginning on the threshold definition for a specific host, service and serviceitem. 
  • Line 05 to 07 define the specific combination of a host, service and service item that the servicedef section is valid for.
  • Line 09 is the start for the first period section. Multiple period sections are support where granularity for day of month, day of week, month and week can be defined. In this example we will define three periods that will be valid for different days. For more information about the threshold configuration please read more in the manual.
  • Line 11 to 16 define when this period is valid. 
  • On line 11 we define that this period is valid on the first day of the month for all months and on line 14 we also define that this period is valid for the second day of week, Monday, for any week of the year. 
  • Line 18 define how the measured value should be evaluated against the threshold for the specific period. Valid options are: Line 19 and 20 define when the warning and critical state should be set. 10 here means 10% above the threshold and for critical 30% above threshold.
    • < – measured value should be lower then threshold
    • > – measured value should be higher then threshold
    • = – measured value should be in a interval of then threshold.In our example we define that the measured ssh port ping response should be lower then the threshold.
  • Line 21 define which hours definition that should be used for the period. In this case it a hours definition with id 1.
  • Line 24 to 39 define an additional period that is valid for Wednesdays and Fridays. And line 38 define that this period use a the hours definition with id 2, which is different from  the previous period.
  • Line 42 to 47 define a period that does not have any specification about valid days. This period is considered to be the default period and used when none of the other period evaluate according to the days specified. 
  • Line 51 is the start of the hours definition with id 1. The hour interval tag define from and to what hour the threshold definition is valid. So between 08:00 to 18:00 the threshold will be calculated as the average of the data points localhost-sshport-response[1] and localhost-sshport-response[2]. Data points can also be retrieved by ranges and by time. 
  • Line 61 to 78 hours definition for id 2 i defined. This one include two hourinterval definitions with a different examples of threshold calculations based on historical data points.
  • Line 81 is the definition of holidays. This means that any date define here, in format MMdd, will be a date where no thresholds are evaluated.

Now we just have one configuration file more to edit, servers.xml. This file describe how to connect to Nagios. With version 0.4.1 we now have two options to connect  to our Nagios server, NSCA and NRDP. In the example we use NSCA. 

 XML | 
 
 copy code |
?

01
<?xml version='1.0' encoding='UTF-8'?>
02
<servers>
03
  <server name="MyNagios">
04
    <class>NSCAServer</class>
05
    <property>
06
      <key>hostAddress</key>
07
      <value>localhost</value>
08
    </property>
09
    <property>
10
      <key>password</key>
11
      <value>nscapassword</value>
12
    </property>
13
    <property>
14
      <key>encryptionMode</key>
15
      <value>XOR</value>
16
    </property>
17
    <property>
18
      <key>port</key>
19
      <value>5667</value>
20
    </property>
21
    <property>
22
      <key>connectionTimeout</key>
23
      <value>5000</value>
24
    </property>
25
  </server>  
26
</servers>

  • Line 3 define the start of a servers definitions. The configuration file can include multiple server entries. All define server entries will receive the bischeck data. In this example only one server entry is defined.
  • Line 4 is the start of a server definition. All server entries must be given a unique name to enable separation between them.
  • Line 5 define the class to use when integrating with Nagios over NSCA.
  • Line 6  to line 30 define an number of properties used by the NSCAServer class to establish connection with Nagios over NSCA. The only one that must be set is password which should be the password defined in your nsca.cfg configuration file. The other one have default values. The default values are the same as set in the example file.
That should be all and you will now have your first bischeck configuration running with dynamic thresholds.
 
Good luck
 

Leave a Reply

Your email address will not be published. Required fields are marked *

*

css.phpFork me on GitHub