Several times throughout the course of a year, I have the fortunate opportunity to participate in conducting Power & Cooling Assessments at customer data centers. Schneider Electric’s Mission Critical Services team gets the benefit of cheap labor and I get to learn and have direct experience auditing the efficiency and reliability of a real, operational facility. It’s a great deal for me. After all, it’s important for research analysts like me to get out of the “Ivory Tower” now and then and experience the real world…build up a little “street cred” if you will.
Anyway, I just got back from doing one. I spent a week at a major pharmaceutical company doing a reliability & capacity analysis for one of their southern campus locations. Flying home lost in thought, I had this weird physical sensation that I was still pulling floor tiles repeatedly in a wind storm (kind of like that feeling you get when you’re lying in bed after being on a boat all day). Anyway, it occurred to me that talking about some of the common themes I’ve encountered might make for a useful blog post…
“I just don’t know what I have.”
This has been a common theme of customers who are absorbing other existing facilities, perhaps through acquisitions or consolidation projects. An enterprise’s network of data centers and IT rooms often have existed for years without any sort of centralized management, monitoring, or standardized operations and maintenance (O&M) programs. Combine that with staff changes over the years and you can end up being in a dark and murky abyss. The “system” might be working. There are gen-sets, storage tanks, UPSs, batteries, transfer switches, cooling units, security cameras and fire detection/suppression systems and more to help ensure the system stays up and running. But do you know what the health and status of that equipment is? Do you know how much power and cooling capacity you really have? When was the last time the equipment was inspected, tested, or serviced? Are the loads balanced? What level of redundancy do you really have? If a UPS fails, do you know how your workloads will be affected? Are there any dead or “zombie” servers and where are they? Basic questions like these can be difficult to answer if you don’t know what you have. Our Services team can come in and fully document your facility, IT, and networking assets…ID, their physical location, status, and interdependencies with other physical resources. We will identify and document threats to availability, as well as provide recommendations and rough budget estimates for resolving any problems found. Beyond that, we offer services to imbue your operations team with the knowledge and expertise to develop and maintain an effective O&M program over the life span of your facilities.
“Extreme reluctance to change”
As a member of the Science Center, I’m always writing or speaking about industry best practices. Many of these have been talked about ad nauseum for years and are often taken for granted in the industry today…separating air streams, overhead power distribution, matching power/cooling resources to the load, using blanking panels, arranging racks into hot and cold aisles, virtualizing servers, standardization, modularity, etc etc. But the reality is many data centers aren’t doing these things…or are only doing them piecemeal. I’ve found that the reason is not because these data center managers are ignorant. It’s often because they are afraid to change for fear of causing an interruption…and this fear, of course, usually stems from not knowing what they have and how everything is interconnected. Schneider Electric services can eliminate this fear and dread. We can reduce or eliminate the risk and help people make their facility safer, more reliable and more efficient. And for those who aren’t making improvements due to a lack of resources, we can either fill the gap or help provide the justification necessary to obtain the funds needed to enact these positive changes.
“You are not alone”
I’ve often been struck by how few people are involved in managing and operating data center facilities given the complexities and importance to the businesses they serve. Particularly in the smaller facilities, operators often fill multiple roles including facilities, server/storage admins, telecom, and networking managers. You frequently read in the industry press that many are being asked to do more with less and that trend certainly has fit what I’ve seen in the field. Not having bandwidth…I’m talking about human attention spans here…is likely a common cause of problems that lead to interruptions and down time. I’m sure it’s also, in part, why facilities often remain stagnant or even decay from a maintenance and upgrade perspective. In many cases, people simply don’t have the time beyond fighting fires and keeping everything running along hour to hour. Third party services from a vendor like Schneider can be an effective way to increase human bandwidth to help you do the proactive, beneficial things that make your facility run better and more efficiently. An effective DCIM implementation is another tool that can really help simplify and provide BI analytics in what would otherwise be a complex, diverse ecosystem of interconnected systems.
Discovering (or worrying that) you have “issues” in your data center doesn’t have to mean you need to move everything to the cloud and shut down your data center. The people, knowledge, and services exist to help you quickly work through these issues. With Schneider’s help, your organization can gain the expertise and information needed to effectively run and manage your facility over its entire life cycle. You are not alone.
Are you curious as to how mature and effective your facility is in terms of the Operations & Maintenance program? We offer a free 80-some page Facility Operations Maturity Model in White Paper 197, Data Center Facility Operations Maturity Model. It’s a very comprehensive, detailed checklist to evaluate how well your team is managing and operating the facility.