Part 6: Disaster Recovery Planning
Posted on August 15, 2019 by Steve Pelletier

Fron High Availability to Archive: Enhancing Disaster Recover, Backup and Archive with the Cloud

Part 6: Disaster Recovery Planning

In this post I will address disaster recovery, which is often referred to simply as DR. DR ultimately relies on technologies from all of the components we’ve discussed in this series so far including backup, archive, and HA. Before I dig into disaster recovery, I want to define a related subject – business continuance. Business continuance planning is planning and preparation by a company to overcome serious incidents or disasters and return to normal operations, including essential business functions, within a reasonably short time frame. Serious incidents and disasters may include natural events such as fires and floods, human error such as viruses or malicious activity, equipment failure, and business issues such as failure of a key supplier or even a stock market crash. Business continuance plans typically address:

  • Resiliency – Designing business functions and related infrastructure to be unaffected by disruptions through the use of spare capacity and redundancy.
  • Recovery – Making preparations to recover or restore both critical and less critical business functions that fail.
  • Contingency – Last resort plans for a generalized capability to cope with unforeseen incidents which could not be managed with the resiliency and recovery plans.

I’m not going to address business continuance in full since it is a HUGE subject. I gave the introduction because DR is a subset of business continuance which addresses keeping the vital technology infrastructure required to support the business processes running in the event of serious incidents and disasters. The key point here is that a disaster recovery plan is based on a business continuance plan. In other words, a disaster recovery plan should prioritize the order in which systems get recovered based on the criticality of the business processes that they support.

So, let’s take a look at a general method to develop a disaster recovery plan. In a future post I will discuss the technologies that can be used to implement the disaster recovery plan.

Tornado - Disaster Recovery PlanThe first step in creating a disaster recovery plan is to identify the business processes in your organization and prioritize them based on which processes need to be restored first and which can be restored last. Guidelines for how quickly these business processes need to be restored also need to be developed. Once the business processes have been identified and ranked, the servers and infrastructure that support each process need to be identified. As an example for an online retailer, a critical business process would be online sales. The online sales may be supported by a set of web servers, application servers, databases, load balancers, and Internet connectivity, so identifying these specific servers and devices is critical. Once all of the systems that support the identified business processes have been identified, plans to recover or restore these systems must be developed. The plans have to include identifying all of the resources that these servers and devices rely on such as storage systems, active directory, email, and DHCP to name a few. This process is known as application dependency mapping. Once dependency mapping is completed, it allows you to determine the order in which systems need to be recovered and brought back online.

When the recovery order has been determined, it can be combined with the guidelines that were developed to identify how quickly the related business processes need to be restored. This will allow recovery time objectives (RTOs) and recovery point objectives (RPOs) to be developed for each of the servers and systems. The RTOs define how quickly a system needs to be brought back online, and can vary from nearly instantly to several weeks, or never for systems supporting non critical business processes. Recovery point objectives (RPOs) define how much data can acceptably be lost from each system. RPOs typically range from nearly zero to up to 24 hours’ worth of data. The closer an RPO or RTO is to zero, the more expensive the solutions tend to be.

When all of the data has been compiled, including all of the systems and servers that need to be recovered, the application dependency map, RPOs and RTOs for all of the servers and systems, a DR plan can be created. The plan needs to address:

  • Where will the servers and systems be recovered
  • What technologies, products, and methods will be used to enable recovery of the systems and servers
  • How and how often will the recovery plan be tested and revised

From this, budgetary costs can be determined. The first time organizations work through a budget they are generally shocked by the jaw-dropping cost they come up with for DR. This causes many to shy away from developing adequate DR plans. To avoid this trap, work closely with your business units. Show them the costs associated with the RPOs and RTOs for they systems that support their business processes. This will typically lead to reprioritizing the order in which business processes need to be restored, and reductions in RPOs and or RTOs for the associated servers and systems. It takes a lot of compromise to come up with a DR plan that meets business needs at an acceptable cost.

Two additional things that need to be addressed in a good disaster recovery plan are access to the servers and systems once they have been recovered, and how to “fail back” to a production environment once it is possible. Both of these will be addressed in upcoming posts.

Next up, Part 7: Disaster Recovery Locations

Get the FREE eBook

From High Availability to Archive: Enhancing Disaster Recovery, Backup and Archive with the Cloud Ebook

This is part 6 of 10 in the From High Availability to Archive: Enhancing Disaster Recovery, Backup and Archive with the Cloud series. To read them all right now download our free eBook.