How to Use Baseline Calculations in SolarWinds

by Robert Kavanagh25, May, 2018Blog Posts, Network Management, Useful Information

In monitoring, the importance of baselining can’t be overstated. It’s all well and good that you are gathering utilisation data for memory, CPU, and other metrics, but without adequately calculated thresholds how do you really know what the performance is? Of course, you have the options of using global defaults, but the best practice approach is to place thresholds at the object layer. For the latter, these can be individual, statically set values, but there is a very useful alternative: Dynamic Thresholds.

Dynamic thresholds do not rely on a single, static value set on a case-by-case by individual users, nor do they depend on a single one-size-fits-all global default. These dynamic thresholds are calculated based on real metrics polled from the object in question. They change over time, adjusting depending on the values captured for the metric in question, with data figures relative to the last 7 days. This is of significant benefit, as the correlation to true usage data allows for anomalous increases in resources to be identified i.e. a deviation from what is ‘learnt’ to be normal.

Here, we are going to look at how we can take advantage of calculated baseline values within SolarWinds®.

Clicking the ‘Use Dynamic Baseline Thresholds button above to insert ${USE_BASELINE_CRITICAL} into the field is rarely the right thing to do as it is just too simplistic in the values it will use. You will see below that there are more capable methods of creating an appropriate dynamic baseline.

When using dynamic thresholds, the background processes are doing the following:

SolarWinds works out the mean average of the data points (the sum of the metrics divided the number of data points).
SolarWinds calculates the standard deviation (the distance between the data points and the mean are calculated; these values are each squared, summed up, and then divided by the number of data points; the square root of this figure is the standard deviation).

We can reference these dynamic thresholds using the following variables:

Already I’m sure you are thinking about the benefits of putting these in place. For many metrics, you might be more concerned with an atypical increase in the values being polled, rather than a single unchanging threshold.

For example, you might be monitoring the interface utilisation on a firewall. The utilisation can vary drastically depending on the time of year, the time of day, the number of users, and many other factors.

Alerting for this metric for some values should be based on sudden and unusual spikes, in the worst case perhaps indicating a DDoS attack or user profile that does not meet your available resources. Imagine that over the Christmas period many in the business are on holiday and activity through the firewall is reduced. If there is an unusual spike in connections, this might fall below a static threshold (likely set based on expected values for a busy period). However, dynamic thresholds would pick this up, informing you that the number of connections falls outside the norm for that period, warranting investigation.

Of course, the default warning and critical dynamic thresholds can be customised even further. Let’s say that the 3 × standard deviation is a bit tight for a particular metric. Perhaps only the very greatest jumps in values polled might be worth breaching a threshold. In this case, the ${USE_BASELINE_CRITICAL} can be replaced by other variables:

[code]${MEAN}

${STD_DEV}[/code]

In this way, your critical threshold could contain the following formula to represent a threshold value of 10 × the standard deviation above the average:

[code]${MEAN} + 20 * ${STD_DEV}[/code]

This gives you even more control over your thresholds than the ${USE_BASELINE_CRITICAL} variable – if you are going to break the mold with dynamic thresholds, don’t pass up the opportunity to apply your own calculation!

One other noteworthy part of dynamic thresholds for the Server and Application Monitor module is that the time period which is used for calculation can be modified:

Go to Settings > SAM Settings
Click on Polling Settings in the Thresholds & Polling section
Scroll down to the Database Settings and adjust the days in the Baseline Data Collection Duration field

Note: The Baseline Data Collection Duration cannot exceed the Detailed Statistics Retention
Hopefully, this article has proven useful to you for fine-tuning your environment for dynamically adjusting thresholds. When it comes to alerting, default thresholds which don’t account for the nature of your environment have a huge impact – I heartily recommend reviewing this great feature. We look forward to hearing your thoughts, and all expect to see you on Thwack to continue the dynamic SolarWinds discussion!

Training Course: SolarWinds Training Courses

Learn More About Our SolarWinds Courses

Robert Kavanagh

Snr. Monitoring Engineer

Robert Kavangh is a Senior Monitoring Engineer at Prosperon Networks. As a Senior SolarWinds Engineer for over five years, Robert has helped hundreds of customers meet their IT monitoring needs with SolarWinds Solutions.

Training Course: SolarWinds Training Courses

Learn More

Related Insights From The Prosperon Blog

How to Use Baseline Calculations in SolarWinds

Training Course: SolarWinds Training Courses

Robert Kavanagh

Training Course: SolarWinds Training Courses

Don’t get lost! Mapping your Network with SolarWinds

The Critical Role Of The Trusted Advisor In NetOps

Webinar On-Demand: Beyond Monitoring – Introducing SolarWinds Observability Platform