The impacts of unanticipated downtime are intensifying as businesses increasingly rely on digital solutions to drive operations and work to elevate customer experiences to gain competitive edge. Disruptions impose significant costs that reach beyond stalled productivity to include noncompliance fines, declining customer satisfaction and brand reputation, and customer churn.
According to ITIC’s 2021 Hourly Cost of Downtime Survey, 44% of firms indicated that hourly downtime costs exceed $1 million and can reach to over $5 million, exclusive of legal fees, fines or penalties. Also, 91% of organizations said a single hour of downtime that takes mission-critical server hardware and applications offline averages over $300,000, due to lost business, productivity disruptions and remediation efforts. Only 1% of organizations – mainly very small businesses with 50 or fewer employees – estimate that an hour of downtime costs them less than $100,000*. This makes business continuity (BC) plans and disaster recovery (DR) plans essential to every business, regardless of industry, revenue or size. The good news is that colocation can provide the foundation that supports the success of these plans.
Business continuity and disaster recovery plans are closely aligned in their efforts to support the availability and resiliency of business operations. For this reason, BC and DR are often used interchangeably. However, they are not the same. BC encompasses the comprehensive scope of protocols and procedures identified to keep the enterprise running. DR is an element of a BC that focuses on safeguarding the integrity of IT infrastructure and data during a disaster.
Together, BC and DR plans outline the steps and resources necessary to support a business’ people, processes, data and IT infrastructure to maintain business-as-usual operations during events that interferes with business operations. This can include natural disasters such as hurricanes, earthquakes and floods; human-caused disruptions resulting from errors or cyberattacks; and pandemics like COVID-19.
BC and DR are often used interchangeably, but they are not the same thing. DR is one element of a comprehensive business continuity plan.
Effective BC and DR plans are investments in success. Both plans require proactive assessment of the impact of various disaster scenarios and details on the steps to take before, during and after the event to minimize business interruptions. Documenting the plans in advance of a crisis is also crucial, because executing the required steps can be difficult under the stress of an emergency situation. Following predefined procedures can also improve the speed and precision of continuity and recovery efforts.
To help businesses manage unexpected events and minimize downtime, organizations must build a BC plan that supports their unique needs.
Determining an organization’s specific BC needs begins with an intense evaluation of the environmental and infrastructure risks the business faces over the long-and short-term. For example, an organization with offices or IT infrastructure in a flood zone must determine how their location might increase risks to the business and make plans to enable continued operations.
COVID-19 has spotlighted the challenges of supporting a distributed workforce. Not only do businesses need to adapt for a remote workforce, they need to account for the increased risk of cyberattacks as well as susceptibility to phishing, fraud and other threats that are more likely to succeed when employees are away from the office environment and the watchful eyes of IT administrators.
It also became clear that finding alternate approaches to performing routine business tasks, delivering products and services to customers, and communicating with employees, vendors, partners and customers are paramount.
Once equipped with a thorough understanding of the relevant risks and their business impacts, organizations can formulate the protocols and procedures needed to maintain operations. The plan should also identify key personnel, assign them specific roles and responsibilities and list all necessary actions. Each team member should receive proper training to ensure they are prepared to effectively execute their duties. The plan should also include a communication strategy, so that personnel can notify stakeholders of the event and keep them updated on the status of the situation.
Testing is critical to validating their feasibility amid constantly evolving threats and business objectives. Changes to the plan might be as simple as updating contact information in the communication strategy or adding a new team member. Alternatively, a change could be more significant, for example recording a new IT system, updating defense strategy to address a new risk or modifying a process that did not work as expected.
Routine testing can help organizations identify issues with the plan before they can affect BC. And, businesses should view BC plans as living documents, revising and honing them after each test or disaster to ensure efficacy.
A BC plan is not complete without incorporating a DR element. A DR plan provides detailed instructions on how to protect and restore critical IT systems, interrupted applications and lost data. The data center itself is an important element in mitigating the risk to uptime. Data centers are built with hardened exteriors that can withstand hurricane force winds; some are architected in anticipation of earthquakes or the challenge inherent in locations where exceptionally high or low temperatures are common.
The DR plan integrates much of the information discovered in the BC plan’s risk and business impact assessments, with a narrowed focus on IT infrastructure. A DR plan requires businesses to identify critical systems, applications and data and rank them according to their criticality in supporting ongoing business operations. Organizations should also identify any dependencies. By tiering IT infrastructure, organizations can ensure the most vital infrastructure is brought back online first.
To limit downtime, a DRP should define the specific policies, procedures and practices necessary to recover IT systems, applications and data after the disaster.
Like the BC plan, the DR plan should provide step-by-step instructions that specify the actions to take before, during and after a crisis. This includes establishing plans to backup data and recover operations. The frequency with which organizations need to perform backups and the speed at which various assets must be restored are key. Identifying recovery point objectives (RPO) and recovery time objectives (RTO) can help organizations decipher this information.
RPO. The RPO isolates the point in time from which data needs to be restored. To define the RPO, organizations must determine how much data they can afford to lose without significantly impacting the business. This information will help organizations decide how frequently they should backup data. Frequent or continuous backups allow organizations to recover data from just before the disruption for minimal data loss. Less frequent backups can only restore data from the last backup. Small businesses that do not operate around the clock or perform frequent transactions can afford to conduct less frequent backups. However, 24x7x365 operations, such as online retailers and financial institutions, that process large quantities of data, need frequent backups to reduce the amount of data lost.
RTO. The RTO defines how much downtime an organization can tolerate without significantly impacting operations. This metric factors in the impact and cost of downtime to help organizations determine how quickly they must restore operations.
Together, these metrics help organizations build a DR strategy that integrates appropriate backup and recovery plans.
Establishing a DR team to carry out the prescribed DR tasks is essential to the efficacy of the DRP. Like the BC team, each member of the DR team should be assigned and trained on specific responsibilities. Organizations should also create a communication plan to allow team members to connect with internal and external stakeholders to keep them informed during the recovery process.
Modern colocation data centers offer the leading-edge infrastructure and expertise to support organizations’ BC and DR strategies.
The various redundancies built into third-party data centers are designed to support uptime and resiliency. Redundant UPSes , generators and CRAHs installed in colocation data centers provide a backup system that can take on the load of a failed system. Additionally, colocation offers diverse power feeds and connectivity options to provide an alternative path if the primary one fails. Colocation providers also foster relationships with vendors to ensure they can secure additional fuel, equipment and other resources during a lengthy disaster.
The level of redundancy offered by data centers varies. Organizations should review a data center’s uptime guarantee before partnering.
Some colocation providers can also offer a portfolio of geographically distant data centers to allow businesses to choose the locations that meet their unique needs. An organization can choose a primary site near its headquarters for convenience and a more remote, secondary facility for DR purposes. This dual deployment model allows businesses to failover operations to the DR site, if the primary location becomes unavailable. The distance between these data centers ensures a localized disaster cannot impact both facilities. This location can also be used to backup data or to provide a facility for staff to work in if the production site is inaccessible.
Colocation data centers also support BC through scheduled maintenance programs and equipment refreshes, which optimize the performance and availability of systems. Routine maintenance can be overlooked by businesses’ internal IT teams, whose focus is on implementing IT strategies designed to reach core business objectives. This can leave systems vulnerable to failures. Third-party data centers have technicians dedicated to monitoring, managing and maintaining data center equipment 24x7x365 to limit issues.
Regular inspections of the physical infrastructure should be part of preventive maintenance. Electrical and power systems, UPS, generators, heating and cooling, smoke detection and fire extinguishing, as well as the miles of cables and connection need to be scrupulously maintained.
Skilled on-staff data center technicians engineers are also available to help organizations design and implement BC and DR strategies that promote uptime.
Third-party data center providers choose locations that are outside of areas prone to natural disasters and other environmental risks. For added protection, they also specifically design and construct the facilities to endure extreme conditions, such as high winds and tremors, to protect customers’ IT assets. To limit unauthorized access that can invite risk, colocation data centers utilize a series of security measures, protocols and best practices—including access controls, video surveillance and on-site, trained security personnel.
Many third-party data centers offer access to both colocation and cloud services. These diverse deployments allow organizations to build flexible BC/DR solutions that integrate hybrid IT, various backup options, disaster recovery as a service (DRaaS) and more. This flexibility also allows organizations to meet specific operational requirements and budgetary restraints. Access to cloud services can be particularly important when supporting remote work capabilities.
While devising BC and DR strategies is complex, the effort is well worth the time and energy spent. The ability to continue business operations and effectively restore IT infrastructure in the face of a crisis is critical to business success. Third-party data centers can offer the advice, technical support and infrastructure to ensure operational resilience and availability in even the most challenging circumstances.
* Information Technology Intelligence Counseling, source
VP of DCO Programs
David is responsible for overseeing the qualification program, program management, process development and metrics reporting for CoreSite’s Data Center Operations organization.Read more from this author