Downtime is an expensive proposition for data centers. An hour-long service outage can cost hundreds of thousands of dollars, and even minutes lost can be costly in the wrong circumstances. In some cases, it can also be a matter of life and death, hence data center risk management being an important part of your business plan.
As technology like self-driving cars and smart surgical robots begin to rely on the cloud, there’s less room for data center downtime. Even in less serious situations, loss of service can lead to supply chain disruptions, work stoppages, and misplaced information.
As service availability has become important, so has managing risk. Data center risk management has become essential to normal operations around the world.
What Is Data Center Risk Management and How Does It Guarantee Service?
For data center staff, effective risk management typically involves three things.
First, risk managers need to identify the potential threats a data center may face — like fire, power outages, break-ins, and floods. This typically involves an assessment that breaks down how people, technology, and practice contribute to or mitigate risk at a particular data center. Managers usually conduct these themselves or work with an experienced provider.
Next, the risk manager will need to identify how they can minimize the threat that’s posed. They’ll create a list of dangers, possible expenses, potential solutions, and how much they will cost.
Finally, the manager needs to develop a strategy for implementing suggested risk-mitigation techniques without disrupting the work of others in the data center. They also don’t want the data center to be unable to provide services to customers.
Why Data Center Uptime Has Become Essential
For most service providers, effective risk management is an essential step toward high service availability (SA). This is sometimes described in terms of nines — for example, one nine is 90% uptime and five-nines is 99.999% uptime.
Five nines — around 5.26 minutes of downtime every year — is often considered the go-to standard for data centers wanting to provide reasonably high service availability. Many customers also expect it as a result. If possible, guaranteeing this level of uptime provides service users with industry-standard reliability and protection against downtime.
Five nines can be a fairly extreme expectation, and only a handful of projects aimed their sights higher. The 1965 1ESS Western Electric telephone exchange, dubbed the “large immortal machine” by its engineers, aimed for seven nines, just milliseconds of downtime per annum.
There are ways for end-users who need reliable uptime, like data scientists and product engineers, to mitigate risk on their own. Edge networking shifts computing away from the cloud toward edge devices, distributing compute power and the dangers that can come with depending on a cloud data center.
However, edge computing isn’t a risk-management strategy in and of itself. Partnering with data centers that can offer high service availability will still be a necessity. As a result, high expectations for data centers will likely stick around, even if the cloud becomes somewhat less central to computing over the next few years.
Common Data Center Risks and Potential Solutions
While each data center is unique, there are a handful of threats that they all must prepare for. Solutions often look similar across different facilities.
Natural disasters can be a major risk. Data centers in areas prone to extreme weather often take advantage of building designs that strengthen the center against flooding and winds. They may also draft special disaster-response procedures that will help employees act quickly in the event of a hurricane or flood.
Complex fire suppression systems involving water and agents like inert gases can help to quickly extinguish blazes, protecting servers and other sensitive hardware. Also, security doors that only allow authorized personnel into the data center can protect against physical security threats.
The Importance of Managing Data Center Risks
Data center uptime has become essential. Risk management strategies that help staff identify risks, plan solutions and integrate threat-management practices are already important and will likely become even more so over the next few years.
Knowing how a data center manages common risks — fire, power outages, unauthorized access — could provide a better understanding of how well that operation is run, providing users with peace of mind.
About the author: Shannon Flynn is a tech writer and Managing Editor for ReHack.com. She covers topics in biztech, IoT, and entertainment. Visit ReHack.com or follow ReHack on Twitter or to see more of Shannon’s posts.