Natural disasters, power outages, database breaches and cybercrime are all viable threats to data centers, but they pale in comparison with the ever-present, and ever-unpredictable risk of human error.
There are ways to safeguard against natural disasters such as solar flares and hurricanes by migrating your workloads to disaster recovery data centers or clouds. You can also encrypt data and strengthening defenses’ against hackers. Unfortunately we can’t always save data centers from ourselves.
With an ever increasing number of data centers becoming “dark” – that is, lights out and unmanned – you’d expect the biggest threat of them all to be minimized, but human error is still the most dangerous thing to data center availability.
While physical security systems and advanced monitoring tools have made safeguarding data centers a lot easier, the risk is still there.
Maintenance windows expose the data center most. This is otherwise known as the process to increase the uptime. It’s at its most vulnerable when humans are called upon to tinker or enhance. Even with concurrent maintenance, where dual UPS or cooling feeds are in effect, there’s a risk of overloading when operations are moved from side ‘a’ to side ‘b’. Without adequate preparation, everything can be brought down accidentally.
It’s a nightmare scenario, but it happens all the time – even in data centers that are using systems monitoring tools such as DCIM. If you have DCIM and you don’t update it and keep track of everything, how can you expect to get the loads right? It’s a delicate balancing act, with no more than 50 percent on each side. Fail to maintain equilibrium and your systems will crash at the worst possible moment.
You also need to consider all of your branch circuits that can overload as well.
Even if your loads are properly balanced and your DCIM is perfectly updated, there’s still no failsafe solution: curious people can press the wrong buttons or fail to set up systems properly in the first place. It’s tough to admit, but we humans are the most imperfect residents in an otherwise pristine digital world.
Expect the unexpected
During any sort of maintenance, you should expect the worst and, unfortunately, it’ll happen about 50 percent of the time.
That’s why maintenance work is often done over the weekend or on national holidays when exposure to risk is lower and users, systems or processes are less likely to be affected by an outage.
So what should your contingency plan be to protect against the possibility of human error in the data center? We’re not yet able to remove the risk entirely, and until the rise of the perfect robotic employee, we’re unlikely to completely resolve the problem any time soon.
The best advice is to know the risks, never under-estimate the power of human mistakes, and design your data center’s systems to be as balanced and resilient as possible. Leverage disaster recovery systems if there is any uncertainty or bad history associated with your maintenance processes. Remembering that humans are the greatest of all natural disasters isn’t going to help you sleep better at night, but it might just help you stay a step ahead of their unintentional but inevitable blunders.