Future of Bischeck | bischeck – dynamic and adaptive monitoring

Summer is over, at least in Sweden, and its time to start a new working period until the snow hits the slopes. Even if our day jobs keeps us busy, there will be time for Bischeck development. During the summer holiday we have had some time to think about the future for Bischeck, thoughts and ideas that I like to share with you and hopefully get your feedback on.

From the very beginning Bischeck was an extension to Nagios based distributions. Integration with Nagios is done using passive check, meaning that Bischeck pushed state and metrics data (performance data) to Nagios for a specific service. Nagios take care of all other stuff like notification, escalation, graphing (add-on), GUI, etc. In early releases of Bischeck we also provided integration with more specialized systems like OpenTSDB and Graphite. Integrating with other specialized systems enabled us to focus on core monitoring functionality that we identified was missing in the market, like dynamic and adaptive thresholds, virtual services combining metrics from multiple source, fine grain scheduling, etc. This is the strategy we will continue, so we can be a puzzle piece in your monitoring infrastructure.

So what do we do to achieve this goal?

Tracking state changes – This would be the equivalent to Nagios HARD and SOFT state changes so Bischeck can be configured to emit notifications. This enable Bischeck to be directly integrated with incident management cloud services like PagerDuty and BigPanda and systems like Flapjack. This is already implemented in the unstable trunk.
Bischeck API’s – In the same way we like to integrate with other solutions, we like to be integrated from others . In the first phase we will create API’s to retrieve configuration information, metrics data, state history and notification events, to enable others to develop Bischeck dashboards and hooks from other systems. The second phase will target an API to push metrics to Bischeck, to complement the existing scheduled pull design.
Specification of the cached data formats so external tools can extract, and even import data, directly through existing Redis API.

There are also a number of other features we are looking at:

Bischeck cluster – Running multiple Bischeck nodes that load balance on “one common” configuration. That would enable scale out and high availability. Redis plays a key part here with its upcoming cluster support.
Baseline threshold learning – something you can sort of do today, but with minimal configuration.
Regex based cache queries – This would enable queries like “avg (.*web-http-responsetime[0:9])” to get the average response time for index 0 to 9, for all host that fulfill the name .*web and have a service called http and a serviceitem called responsetime.
Fault tolerance for configuration error and dynamic reload on a single servicedef, without the need for a complete configuration reload.
Filtering on cache queries to exclude metrics data that did not match a filter, for example metrics samples that was in state WARNING or CRITICAL.

Please give us feedback what you like or if you have other ideas or wishes. Remember you can always create a feature request at gforge.ingby.com or contact us at bischeck@ingby.com.

Remember that you can get professional service for Bischeck, that will also help us continue our development effort.

Puzzle image created by By Ganeshk (CC license)

Leave a Reply Cancel reply