Category Archives: News

Bischeck 0.4.2 performance testing

Introduction

Performance testing is key to secure that your software can handle the load and to verify the robustness of the software. With server based software, running as a daemon, it is especially important to verify that the software is stable during a long period of continues uptime without decreased throughput and by leaking resources, like memory. 

Since bischeck is designed to do advanced service check with dynamic and adaptive thresholds we know that cpu and memory will be important resources when operating with mathematical algorithms over historical collected data.

The test setup will start with a baseline that is scaled in two dimensions, increase the load by increase the number of service jobs and increase the load by decrease the interval between service job schedules.

Read the full benchmark report 

 

bischeck 0.4.2 RC2 is released – test now!

We are pleased to release the second release candidate for bischeck 04.2. The major change in this release candidate is how null values are managed for mathematical functions that takes a list of arguments like sum and avg. Read more about this feature in the documentation.

 This release include the following features and fixes:

New feature

• Related to bug [TR-227 ] the naming of host, service and serviceitem names has been improved. 

• Execution statements and thresholds hour specification where cache data is retrieved as a list, like in a function as avg(x-y-z[4:10]) and max(x-y-z[-5M:-15M]), can now be configured to return a value as long as at lest one index in the range is not null. To support backwards capability the new functionality will only be used if the property notFullListParse is set to true in the properties.xml. The default value is false. 

• There has been some discussion about what Nagios state should be sent if the a the returned execute statement of a service item is null. In previous releases this has been hard coded to OK, but now its possible to define it by setting the property stateOnNull. The property can be set to an integer 0,1,2 or 3 or to a string OK, WARNING, CRITICAL or UNKNOWN. The default is UNKNOWN.

• When a service class get an exception when creating a connection the previous versions did not save any data to the cache. If the property saveNullOnConnectionError you will now get a null value inserted into the cache when a connection exception is thrown. For backwards compatibility the default value of the property is false.

Bugs fixed and important issues

• [TR-227] “Cache parser do not work for host, service or serviceitems if the name include 0 (zero)” has been resolved.

• [TR-228] “Threshold factory return wrong threshold definition if service and serviceitem name is the same for different hosts” has been resolved.

• [TR-229] “When using service ShellService the number of open files limit will be reached” has been resolved.

• [TR-230] “NRDP submissions all come in as OK” has been resolved.

• Fixed migration script from 0.4.0 to copy etc directory content correctly. Changes in the file urlservices.xml will be overwritten. Existing 0.4.0 configuration will still be available in the previous version backup directory, bischeck_0.4.0.

bischeck 0.4.2 RC1 is released – test now!

We are pleased to release bischeck 0.4.2 release candidate 1.  This release include the following features and fixes:

New feature

  • Related to bug [TR-227 ] the naming of host, service and serviceitem names has been improved. For more info please see 8.1↑
  • Execution statements and thresholds hour specification where cache data is retrieved as a list, like in a function as avg(x-y-z[4:10]) and avg(4,6,8), can now be configured to not return a null value if at least the first index in the list definition has a cached value. This means for the example that if, at least, index 4 as a value for the x-y-z an average will be calculated. To support backwards capability the new functionality will only be used if the property notFullListParse is set to true in the properties.xml. The default value is false.
  • There has been some discussion about what Nagios state should be sent if the a the returned execute statement of a service item is null. In previous releases this has been hard coded to OK, but now its possible to define it by setting the property stateOnNull. The property can be set to an integer 0,1,2 or 3 or to a string OK, WARNING, CRITICAL or UNKNOWN. The default is UNKNOWN.
  • When a service class get an exception when doing a connection the previous versions did not save any data to the cache. If the property saveNullOnConnectionError you will now get a null value inserted into the cache when a connection exception is thrown. For backwards compatibility the default value of the property is false.

Bugs fixed and important issues

  • [TR-227] “Cache parser do not work for host, service or serviceitems if the name include 0 (zero)” has been resolved.
  • [TR-228] “Threshold factory return wrong threshold definition if service and serviceitem name is the same for different hosts” has been resolved.
  • [TR-229] “When using service ShellService the number of open files limit will be reached” has been resolved.
  • [TR-230] “NRDP submissions all come in as OK” has been resolved.
  • Fixed migration script from 0.4.0 to copy etc directory content correctly. Changes in the file urlservices.xml will be overwritten. Existing 0.4.0 configuration will still be available in the previous version backup directory, bischeck_0.4.0. 
This RC1 version do not support upgrade from previous version, but just copy your configurations files and give it a spin.
Read more about 0.4.2 
 

bischeck 0.4.1 released – upgrade now!

We have made a quick new release of bischeck due to a bug that caused truncation of all  measured and threshold values with more then 2 decimal values. This caused some obvious problems, especially if we are measuring stuff like network times. So if you are monitoring these kind of stuff please upgrade asap.

Since we had some new stuff in the trunk we chose to include it to, but they should be regarded as beta functionality. The new functionality are:

  • Sending passive checks over NRDP as an alternative to NSCA
  • New Service and serviceitem that support execution of local check commands. With this functionality any Nagios check commands that output performance data can now be executed through bischeck. The state is of course ignored since bischeck will do its own threshold calculation of the performance data. Thanks to Eric Loyd at Bitnetix (www.bitnetix.com) that gave me the idea during Nagios World 2012.

For more information about this new functionality please check out the 0.4.1 README. Feedback on 0.4.1 is more then welcomed.

To download bischeck 0.4.1 please visit our download area.

bischeck 0.4.0 released

Today we announce bischeck 0.4.0. The released has during 2 months been tested in a production environment. Upgrading from 0.3.0 and 0.4.0_RC2 is supported. No major changes has been done since RC 2. Full documentation and download is available.

New feature

  • [FR-197] Support for different and multiple integration with different surveillance and monitoring systems. With version 0.4.0 bischeck is not limited to send data to Nagios. It can now send the data to multiple Nagios servers and to other servers like OpenTSB. This is done by moving server formatting and protocol to server integration classes that implements the interface com.ingby.socbox.bischeck.servers.Server. The server integration is described in the xml configuration file servers.xml. This also means that that some Nagios NSCA specific properties previous configured in properties.xml has been moved to the servers.xml file in the NSCA section. The OpenTSDB server class should be regarded as beta.
  • [FR-202] The implementation of running bischeck once, in a none daemon mode, is changed so the same code is used as running in daemon mode. The only difference is that the initialization of triggers are different so all service items are just ran directly and and just once.
  • [FR-204] The bischeck cache will be saved when the bischeck daemon is shutdown and reloaded on bischeck startup. Keeping the cache persistent between restarts is important since 0.4.0 support time based cache retrieval. The limitations is currently that if the bischeck daemon is killed by a signal that can not be caught or the daemon crash the data will not be saved. This will be improved in future versions.
  • [FR-218] The bischeck daemon can now reload the configuration without a process restart. This is support through the JMX operation “reload”. The feature will limit the need of operating system access and authorization.
  • [FR-219] Bischeck can now retrieve state and performance data from a Nagios server supporting livestatus. With the service class LivestatusService a connection is set up over livestatus and with the and serviceitem class LivestatusServiceItem state and/or performance data can be retrieved from the a Nagios service. This can be useful when when creating virtual services in bischeck or used in complex thresholds.
  • [FR-220] Bischeck now support one additional scheduling method where scheduling can be defined to run a service after a different service has executed. This can be useful when a service is depending on data for another service for its thresholds or execution statement.
  • [FR-221] Cache retrieval is now support by using a time offset to find the nearest cache element to the time offset.
  • Cache data can be retrieved as a list of elements based both on index and time.
  • Support for additional mathematical functions like average, min and max calculations on list of elements.
  • Bischeck can now support the usage of cached data in an execution statement of a serviceitem. This is typical useful when a serviceitem execute statement is depending on other service data. For example in a SQL query string:
    select value from table1 where id = host1-web-state[0] and createdate = ’%%yyyy-MM-dd%%’");
  • Added support for other Linux distributions then Redhat based. bischeck should now install on Debian 6 and Ubuntu 10/11.
  • Configuration listing. The configuration listing has been moved from the ConfigurationManager class to the DocManager class. Currently html and text listing is supported. The generated configuration data will by default placed in the bischeckdoc directory.
  • A configured service can be configured not to send its data to a the configured monitoring servers like Nagios. This can be useful if the service is just to be used to create virtual services or just to be used as thresholds.
  • The bischeck script now support JMX authentication. The authentication files are located in the etc directory and named jmxremote.password and jmxremote.access. Default is to that authentication is disabled by the system property
    “-Dcom.sun.management.jmxremote.authenticate=false”. To enable authentication set the property to true. For more info about JMX see
    http://www.oracle.com/technetwork/java/javase/tech/javamanagement-140525.html.

Bugs fixed and important issues

  • The Twenty4Thresholds class was in previous version not doing a correct linear equation calculation if a expression based threshold was defined. Lets illustrate the errors with this example from the 24thresholds.xml configuration file having a mix with static and expression based thresholds.
    ....
    <!-- 12:00 -->
    <hour>7000</hour>
    <!-- 13:00 -->
    <hour>testhost-testservice-testitem[1] / 3</hour>
    <!-- 14:00 -->
    <hour>testhost-testservice-testitem[1] / 2</hour>
    <!-- 15:00 -->
    <hour>testhost-testservice-testitem[1] + 1000 </hour>
    <!-- 16:00 -->
    <hour>12000</hour>
    ....

    In the previous version the threshold value between 12:00 and 13:00 would be null since it was a mix of static and expression based thresholds. And between 15:00 and 16:00 the threshold would have been calculated as “testhost-testservice-testitem[1] + 1000” independent of the time between 15:00 and 16:00.

    Now the linear equation will correctly be calculated with any mix of static and expression based definitions. In the above example the calculated threshold for 12:20 will now be:

    20*((testhost-testservice-testitem[1]/3) - 7000)/60 + 7000
    This fix will improve the correctness and also the capability of threshold adaptivity.
  • The Service interface has a number of new methods that should been there from the beginning. If you developed any service class you need to add these, but if you just inherited ServiceAbstract its fixed for you. The new methods are:
    public NAGIOSSTAT getLevel();
    public void setLevel(NAGIOSSTAT level);
    public boolean isConnectionEstablished();
    public void setConnectionEstablished(boolean connected);
    public Boolean isSendServiceData();
    public setSendServiceData(Boolean sendServiceData);
  • Property cacheclear is renamed to thresholdCacheClear.
  • All the nsca related properties has been moved from properties.xml to servers.xml when used for the NSCAServer class. The new property names has also gone through some minor changes. When upgrading a manual update is needed of the servers.xml file with the current setting of nsca related properties in properties.xml. Recommended that these are later removed.
  • All JAXB generated configuration classes now support serialization.
  • Quartz jar is upgraded from 2.0.1 to 2.1.5.
  • [TR-216] “Shutdown is automatic triggered”
  • [TR-217] “Configuration Manager initialization failed with java.lang.NullPointerException”
  • [TR-207] “sudo in bischeckd script cause problem at boot”

Bischeck on Nagios World Conference 2012

Once again its time for the Nagios World conference in St Paul. This time 3 days with lots of good stuff, http://www.nagios.com/events/nagiosworldconference/northamerica/2012/. If you like to know more about dynamic and adaptive thresholds com and join my presentation on the third day, http://www.nagios.com/events/nagiosworldconference/northamerica/2012/speakers/#ahaal.

Look forward to meet you all in St Paul.

bischeck 0.4.0 RC 2 released

The second release candidate of bischeck 0.4.0 is now available for download. The 0.4.0 RC2 of bischeck includes the following new features and fixes:

New feature

  • [FR-197] Support for different and multiple integration with different surveillance and monitoring systems. With version 0.4.0 bischeck is not limited to send data to Nagios. It can now send the data to multiple Nagios servers and to other servers like OpenTSB. This is done by moving server formatting and protocol to server integration classes that implements the interface com.ingby.socbox.bischeck.servers.Server. The server integration is described in the xml configuration file servers.xml. This also means that that some Nagios NSCA specific properties previous configured in properties.xml has been moved to the servers.xml file in the NSCA section. The OpenTSDB server class should be regarded as beta.
  • [FR-202] The implementation of running bischeck once, in a none daemon mode, is changed so the same code is used as running in daemon mode. The only difference is that the initialization of triggers are different so all service items are just ran directly and and just once.
  • [FR-204] The bischeck cache will be saved when the bischeck daemon is shutdown and reloaded on bischeck startup. Keeping the cache persistent between restarts is important since 0.4.0 support time based cache retrieval. The limitations is currently that if the bischeck daemon is killed by a signal that can not be caught or the daemon crash the data will not be saved. This will be improved in future versions.
  • [FR-218] The bischeck daemon can now reload the configuration without a process restart. This is support through the JMX operation “reload”. The feature will limit the need of operating system access and authorization.
  • [FR-219] Bischeck can now retrieve state and performance data from a Nagios server supporting livestatus. With the service class LivestatusService a connection is set up over livestatus and with the and serviceitem class LivestatusServiceItem state and/or performance data can be retrieved from the a Nagios service. This can be useful when when creating virtual services in bischeck or used in complex thresholds.
  • [FR-220] Bischeck now support one additional scheduling method where scheduling can be defined to run a service after a different service has executed. This can be useful when a service is depending on data for another service for its thresholds or execution statement.
  • [FR-221] Cache retrieval is now support by using a time offset to find the nearest cache element to the time offset.
  • Cache data can be retrieved as a list of elements based both on index and time.
  • Support for additional mathematical functions like average, min and max calculations on list of elements.
  • Bischeck can now support the usage of cached data in an execution statement of a serviceitem. This is typical useful when a serviceitem execute statement is depending on other service data. For example in a SQL query string:
    select value from table1 where id = host1-web-state[0] and createdate = ’%%yyyy-MM-dd%%’");
  • Added support for other Linux distributions then Redhat based. bischeck should now install on Debian 6 and Ubuntu 10/11.
  • Configuration listing. The configuration listing has been moved from the ConfigurationManager class to the DocManager class. Currently html and text listing is supported. The generated configuration data will by default placed in the bischeckdoc directory.
  • A configured service can be configured not to send its data to a the configured monitoring servers like Nagios. This can be useful if the service is just to be used to create virtual services or just to be used as thresholds.
  • The bischeck script now support JMX authentication. The authentication files are located in the etc directory and named jmxremote.password and jmxremote.access. Default is to that authentication is disabled by the system property
    “-Dcom.sun.management.jmxremote.authenticate=false”. To enable authentication set the property to true. For more info about JMX see
    http://www.oracle.com/technetwork/java/javase/tech/javamanagement-140525.html.

Bugs fixed and important issues

  • The Twenty4Thresholds class was in previous version not doing a correct linear equation calculation if a expression based threshold was defined. Lets illustrate the errors with this example from the 24thresholds.xml configuration file having a mix with static and expression based thresholds.
    ....
    <!-- 12:00 -->
    <hour>7000</hour>
    <!-- 13:00 -->
    <hour>testhost-testservice-testitem[1] / 3</hour>
    <!-- 14:00 -->
    <hour>testhost-testservice-testitem[1] / 2</hour>
    <!-- 15:00 -->
    <hour>testhost-testservice-testitem[1] + 1000 </hour>
    <!-- 16:00 -->
    <hour>12000</hour>
    ....

    In the previous version the threshold value between 12:00 and 13:00 would be null since it was a mix of static and expression based thresholds. And between 15:00 and 16:00 the threshold would have been calculated as “testhost-testservice-testitem[1] + 1000” independent of the time between 15:00 and 16:00.

    Now the linear equation will correctly be calculated with any mix of static and expression based definitions. In the above example the calculated threshold for 12:20 will now be:

    20*((testhost-testservice-testitem[1]/3) - 7000)/60 + 7000
    This fix will improve the correctness and also the capability of threshold adaptivity.
  • The Service interface has a number of new methods that should been there from the beginning. If you developed any service class you need to add these, but if you just inherited ServiceAbstract its fixed for you. The new methods are:
    public NAGIOSSTAT getLevel();
    public void setLevel(NAGIOSSTAT level);
    public boolean isConnectionEstablished();
    public void setConnectionEstablished(boolean connected);
    public Boolean isSendServiceData();
    public setSendServiceData(Boolean sendServiceData);
  • Property cacheclear is renamed to thresholdCacheClear.
  • All the nsca related properties has been moved from properties.xml to servers.xml when used for the NSCAServer class. The new property names has also gone through some minor changes. When upgrading a manual update is needed of the servers.xml file with the current setting of nsca related properties in properties.xml. Recommended that these are later removed.
  • All JAXB generated configuration classes now support serialization.
  • Quartz jar is upgraded from 2.0.1 to 2.1.5.
  • [TR-216] “Shutdown is automatic triggered”
  • [TR-217] “Configuration Manager initialization failed with java.lang.NullPointerException”
  • [TR-207] “sudo in bischeckd script cause problem at boot”