Responding to Data Center Threats: How to Create an Effecting Threshold and Alerting Strategy

This audio was created using Microsoft Azure Speech Services

As explained in a previous post, physical data center security goes well beyond merely monitoring for unauthorized human activity. It involves a series of sensors to monitor everything from liquid leaks to humidity levels, temperature and smoke – all of which can threaten the proper functioning of your data center, branches and network closets.

But the most sophisticated monitoring system in town won’t help if its not configured correctly, with effective thresholds, and backed up with an alarming strategy that will ensure help arrives in time.

Setting thresholds is both an art and a science. You want to know when something is headed in the wrong direction, but don’t want sensors producing an overwhelming number of unnecessary alerts. In general, set thresholds to their default values, then adjust individually based on factors such as sensor location. For example, a sensor located close to a server power supply should alarm at a higher value than a sensor located close to the air inlet of a server.

Ideally, your monitoring systems should allow you to configure multiple thresholds per sensor in order to alert at informational, warning, critical, and failure levels. In addition, conditions such as over-threshold for a specified amount of time, rate of increase, and rate of decrease should also trigger alerts. In the case of temperature, for example, alerting on rate of change provides a quicker indication of failure than a snapshot temperature at a particular point in time.

You should also set thresholds to create different alerts based on the severity of the incident. An IT administrator can probably handle a humidity alert, for example, but when a smoke sensor triggers an alert, that may warrant an automatic call to the fire department.

That means for each threshold you need to think through the various alerting methods you can employ – how the alert should be sent and to whom – and at what point escalation needs to occur.

You’ve got lots of alerting methods to choose from, including e-mail, text messages, SNMP traps, and posts to HTTP servers. It is important that the alerting systems be flexible and customizable so that the right amount of information is successfully delivered to the intended recipient. Make sure your alert notifications are specific, including information such as the user-defined name of the sensor, sensor location, and date/time of alarm.

For example, an alert that reads, “Door sensor has been activated,” is not very useful because it doesn’t tell you which door was opened. A more useful alert would read, “Door 4 at northeast corner has been opened, and a picture of the person opening the door was captured.” That identifies not only the specific door and its location, it alerts you that you can view a photograph of the incident.

A good, intelligent monitoring system will also escalate alarms if they are not resolved in a timely manner – with “timely” being another threshold that you’ll need to set.  Think through a logical progression of who should be alerted for each type of alert, to ensure problems are addressed before small issues cascade into larger issues.

For more on how to devise an effective threshold and alerting strategy, including a list of default thresholds for temperature and humidity based on industry standards, check out the APC by Schneider Electric white paper, Monitoring Physical Threats in the Data Center.

Tags: