Prediction based monitoring

With the upcoming version 1.0.0 of Bischeck we have add some new capability for prediction based monitoring. Prediction mean that we use historical data to calculate the future, commonly called regression analysis.

There are many ways to do regression analysis depending on the data. In 1.0.0 we have implemented the “ordinary least square” method based on the Apache common math package. The functions we have implemented can be used in Bischeck in any place that currently support a mathematical expression, like thresholds and in the execstatement for the ServiceItem class CalculateOnCache. 
For ordinary least square we have implemented two functions, ols and olss. The ols function calculate a future value and the olss function calculate the current slope. 

ols(“host”,”service”,”serviceitem”,”AVG”,”D”,”10″,”-30D”)

The above will calculate the predicted value for host-service-serviceitem in 10 days from now, based on the historical data from now and 30 days back in time. As a basis for the prediction it will use an average calculation for all the the data related to the same day interval. Other methods then average are MAX and MIN. Interval resolutions other then day (D) are hour and week.
So lets say I want to have an alarm if the host-service-serviceitem will be greater then 10000 in the next 10 days based on the last 30 days of data. First define a serviceitem for host-service called predict using the CalculateOnCache in the bischeck.xml.

 XML | 
 
 copy code |
?

01
<host>
02
<name>host</name>
03
04
<service>
05
<name>service</service>
06
07
<serviceitem>
08
<name>predict</name>
09
<execstatement>ols("host","service","serviceitem","AVG","D","10","&minus;30D")</execstatement>
10
<thresholdclass>Twenty4HourThreshold</thresholdclass>
11
<serviceitemclass>CalculateOnCache</serviceitemclass>
12
</serviceitem>
13
 
14
</service>
15
</host>

As the next step define the threshold for host-service-predict in the 24thrsholds.xml file.

 XML | 
 
 copy code |
?

01
….
02
<servicedef>
03
<hostname>host</hostname>
04
<servicename>service</servicename>
05
<serviceitemname>predict</serviceitemname>
06
 
07
<period>
08
<calcmethod><</calcmethod>
09
<warning>10</warning>
10
<critical>20</critical>
11
<hoursIDREF>100</hoursIDREF>
12
</period> 
13
</servicedef>
14
15
<hours hoursID="100">
16
<hourinterval>
17
<from>00:00</from>
18
<to>23:00</to>
19
<threshold>10000</threshold>
20
</hourinterval>
21
</hours>
22
...
23

As mentioned before the ols function can be used in any mathematical expression so the below is valid expression (not sure if its applicable in real life):
<execstatement>ols(“host”,”service”,”serviceitem”,”AVG”,”D”,”10″,”-30D”) / sum(host1-service2-item2[0:10]</execstatement>

 
The function olss() will calculate the current slope based on the historical data.

ols(“host”,”service”,”serviceitem”,”AVG”,”D”,”-30D”)

In the above example the slope calculation is based on data from now and 30 days back in time. The slope function olss is excellent to use when you need to understand how fast or slow something change, like number of rows in a database, file system growth speed, etc.

The ordinary least square are part of bischeck 1.0.0 RC 1 so you can test it now.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

*