This audio was created using Microsoft Azure Speech Services
I often see companies putting huge amounts of money into new data centers, bringing in design experts, architects and the like. They create buildings that look great and certainly have all the room they need to house all kinds of IT equipment. But too often, they miss a critical element: planning around critical facility operations.
These companies fail to plan for operational sustainability, which is a term that describes data center operations and is crucial to business continuity and reducing the total cost of ownership. Achieving operational sustainability requires companies build an operational methodology for how they do pretty much everything in the data center. In building the methodology, they would do well to avoid these eight common mistakes.
1. Failing to include the operations team in the facility design. In many instances the facility and operations teams are excluded from the information gathering and design phases of the data center facility design, which is part of the data center lifecycle (namely Design, Build, Operate and Maintain). A “business friendship” is required between IT and the facility/operations groups to ensure the design meets the requirements of both IT and facility infrastructure teams, taking into account the future operations phase. Without such a partnership, modification and repairs often become necessary because equipment isn’t sized correctly, the data center has stranded capacity or, even worse, the IT load is exhausted with spare data centre space or vice versa. When you include the operations team in the design phase, you will “build with the end in mind” and that is the essence of TCO
2. Relying too much on data center design for redundancy. You can build in high levels of redundancy in pretty much any data center component. But without an appropriate operations and maintenance program in place, you may not realize the redundancy you expect. Failure to properly maintain a UPS, for example, may render it inoperable when you need it most.
3. Failure to staff adequately. Many data centers these days operate 24 x7. That means you not only need appropriate staff around the clock, but also need to cover employees who are on vacation, sick, in training or the like.
4. Failure to develop in-house talent. Training is a big part of creating a positive work environment for your employees. Everyone wants to grow and it’s in your best interest to educate your work force, as failure to do so leaves you with employees who aren’t skilled on the latest technologies – and will likely jump ship for a better offer at the first chance.
5. Failure to drill and test. If the first time your data center staff tests emergency response procedures is during an actual emergency, chances are the results will be less than perfect. Plan on periodic drills to test your response procedures and the skill of those implementing them.
6. Failure to document processes and procedures. Anything that happens in a data center, from changing out a server to performing maintenance on a CRAC unit, should be well documented so that it’s understood how the task should be performed. The expected result of the procedure should also be clearly documented, such that employees will quickly be aware of any variance from that result and can take steps to remedy it.
7. Failure to assess and improve change management. Documenting processes and procedures is all about proper change management. To improve, you need to continually assess the way you do things and look for ways to do them better, meaning more efficiently and reliably.
8. Failure to address quality management. Continually getting better at change management is part of an overall effort at quality improvement with respect to data center efficiency. Lots of software tools exist that can help, such as data center infrastructure management tools.
Data centers must be built first and foremost with reliability and availability in mind if they are to meet the organization’s business continuity requirements. As the saying goes, a data center is only as reliable as its weakest link. Then you need to look at efficiency, which starts with the design phase of the data center life cycle and takes into account the appropriate level of availability and right-sizing of all the equipment. During the life of the data centre, critical facility operations – the essence of operational sustainability – becomes increasingly important because it impacts uptime, efficiency and the total cost of ownership.
You can build a new data center and fill it with the latest and greatest equipment, but unless and until you address operational sustainability, you’ll never get all that you can out of it.
Conversation
Ramachandra Nimbargi
12 years ago
Good insight
Roberto Michel
12 years ago
All great points, and interesting how many of them are related to operational best practices and the human/process elements of keeping things running reliably.