Part 9: More DR Tools
In the last post I covered three of the DR tools used to help implement policies. Now let’s address some additional DR tools. So far I have covered contingency groups, tape backup and related variants, and storage based replication. Let’s keep moving down the list picking up with OS level replication.
- Consistency groups
- Tape backup and related variants
- Storage replication
- OS level replication
- Application replication
- Hypervisor based tools
- The use of SaaS
OS Level Replication
OS level replication requires the installation of a third party application that intercepts all of the disk writes from the operating system and asynchronously sends them to another server at the recovery site. This allows for nearly instantaneous RTOs and RPOs of seconds to minutes. The downside to these DR tools is that most of them require a running server at the recovery site for each server using this tool. This adds cost and management time to the solution. A benefit of OS-based replication is that these tools have some understanding of the operating systems they are running on. This allows them to do things like inject new hardware drivers into the remote image, or ensure unique Windows SIDs, which allows replication to hardware which does not fully match the original server.
Application Based Replication
Some applications have replication capabilities of their own such as Oracle log shipping. Application-based replication requires compatible versions of the software running at both the production and recovery sites. There may be licensing fees associated with this. The biggest benefit of application-based replication is that it is typically hardware agnostic. That is, you may be able to run production instances on X86 servers and replicate multiple instances to a single midrange server running a compatible version of the software. Keep in mind that you need compatible versions of the software at both sites. This may require additional planning when patching and upgrading applications that do this type of replication.
While virtualization in and of itself is not a DR tool, it can enable the use of virtualization aware tools for replication as well as enhancing the capabilities of other DR tools. There are many types of virtualization which can help with disaster recovery including:
- Server Virtualization
- Storage Virtualization
- Network Virtualization
- Application Virtualization
In this post I am going to limit the discussion to server and storage virtualization.
Server virtualization allows multiple operating systems to run on one physical server at the same time. It does this by providing an abstraction layer between the underlying hardware and the running operating systems, also known as virtual machines or VMs. So, what does this abstraction layer do? The abstraction layer takes control of all of the underlying physical hardware including CPU, memory, storage, and networking, and manages how the VMs are given access to these resources. The abstraction layer has the added benefit of hiding what the actual physical resources are by presenting them to the VMs as standardized hardware. This means that no matter what your physical network interface card is, it will always be presented to the VMs as the same NIC type and model that has been configured by the Hypervisor (another name for the abstraction layer). The same is true for storage and memory.
This allows a VM to be portable between physical servers that have different configurations and/or hardware. Another trait of virtualizing servers is that most VMs are stored as small number of files on the hypervisor. This in essence encapsulates the VM and allows you to replicate an entire VM by simply replicating these files. At one point in the early days of X86 server virtualization I built a Linux-based VM with an open source monitoring tool on it for a customer. After several weeks, someone did something that broke the application. To solve the problem, I zipped up the two files that held the image and configuration of that VM which I had kept copies of, and e-mailed them a new server. This ease of replication and the ability to run on dissimilar hardware allows a lot more flexibility when implementing DR solutions for virtualized servers.
Storage virtualization is similar to server virtualization in that it puts a layer of abstraction between the underlying storage and the servers that use it. Server virtualization also allows you to implement higher level storage services such as disk mirroring and snapshotting at the virtualization layer instead of the individual storage devices. Basically, this allows you to clone or replicate storage between devices that would typically be incompatible, such as replicating data between EMC and IBM storage devices. This opens up a lot of possibilities when it comes to storage-based replication scenarios for DR. There are many additional benefits to storage virtualization, but they are out of scope for this discussion on disaster recovery.
Now that I’ve given a high level overview of virtualization, I can address Hypervisor-based tools. The hypervisor tools that come to mind most often in DR discussions are replication tools. There are three main types of replication tools, snapshot tools, continuous data replication tools, and virtualization aware storage replication tools. Snapshot replication tools take advantage of a hypervisors ability to create point-in-time snapshots of running virtual servers. When a hypervisor creates a snapshot of a virtual machine, it stops all disk writes to the data files that hold the VM and redirects new writes to a new file. This allows the old data files to represent a point in time snapshot of the VM, while allowing the VM to continue running without interruption due to write going to the new file, often called a delta file. Once a snapshot is no longer needed, the original data files are merged with the delta file, all IO is redirected to the merged file, and the delta file is deleted – this is known as committing the snapshot. Snapshot replication tools schedule snapshots of the VMs they are replicating, they then replicate the point in time image of the VM to the DR site, and finally when the replication is done, they commit the snapshot. This method requires additional disk space to hold the delta files for VMs, and increases the amount of disk IO and processing overhead during the snapshot and commit procedures. Since it takes time to commit a snapshot, this puts a limit on how frequently snapshots can be taken which limits how often this type of tool can replicate a new snapshot. A general rule of thumb for most virtual machines is that a snapshot replication tool can replicate a server about every 15 minutes. Results may vary from VM to VM. This type of replication allows for very short RTOs and RPOs that are determined by how frequently a snapshot replica of a VM can be made.
Continuous data replication tools work a lot like OS replication tools except they run at the hypervisor level. This type of tool inserts itself in the disk IO stream between the hypervisor and the VMs. They intercept all data writes and asynchronously send the data to a hypervisor at the disaster recovery site. This allows for very short RPOs and RTOs. RPOs are typically less than one minute. There are a few benefits to this type of replication. Since it does not rely on snapshots, there is no need for additional disk space, and there is no increased disk IO. This type of replication tool is typically more expensive than snapshot replication.
Virtualization aware storage replication tools integrate with the storage systems used by the hypervisor to replicate the virtual machines to storage at the recovery site. This type of replication requires that your storage systems are compatible with the tool being used since the tool needs to be able to manipulate low-level features of the storage. This type of tool also requires that you have storage capable of replicating data to a remote location which may involve additional licensing costs on your storage platforms. In certain circumstances, additional configuration of how and where VMs are stored may be required to facilitate the use of consistency groups. Overall, this type of tool provides very short RPOs and RTOs.
These three types of hypervisor enabled replication tools have a few additional benefits. Most of these tools have the ability to do minor modifications to the VMs they replicate, such as changing their IP addresses to help facilitate the move to a new data center. They also have the ability to help automate the switch to a disaster recover site by powering virtual servers on in a pre-specified order to comply with the disaster recovery plan. An additional benefit is that there are fewer end points to manage and keep updated since they are installed on the hypervisors instead of in each operating system instance like OS replication systems. Many of these tools also have the ability to access the contents of the VM images to allow the recovery of individual files. This gives them some utility for providing backup services for the VMs they are protecting. This is usually done by using a journaling system that allows files or entire VMs to be recovered at multiple points in time.
Software as a Service
With the rise in cloud computing, many organizations have begun adopting applications that are delivered via Software as a Service (SaaS). This means that the application is hosted in the cloud and delivered via a subscription. The application and its data are held in the cloud and are accessible via the internet. In the case of SaaS offerings, the availability of the application is the responsibility of the application provider. That said, you cannot say that since it is a SaaS application we don’t need a DR plan for it. You must do due diligence to make sure that the SaaS provider’s DR plan is sufficient to meet your needs. You also need to look into the backup and archiving capabilities of the SaaS provider to make sure your data is safe and you are meeting any compliance regulations you need to adhere to.
Next up, Part 10: Applying Disaster Recovery Tools to Your Plan. Until next time, keep your data protected.
Get the FREE eBook
This is part 9 of 10 in the From High Availability to Archive: Enhancing Disaster Recovery, Backup and Archive with the Cloud series. To read them all right now download our free eBook.