Why recursive VSS is good for backing up virtualized Windows Servers

Microsoft Volume Shadow Copy Service (VSS) can back up Windows-based virtual servers while ensuring data is application-consistent.

In a world where server virtualization is a “when” and not an “if,” it’s important to understand how backup strategies might change when moving from OS-centric (guest-level) backups to host-based (hypervisor) protection. More importantly, not all host-based backup methods are the same, particularly when it comes to transactional apps such as Microsoft SQL Server or Exchange.

According to Enterprise Strategy Group (ESG) Research, 46% of all IT environments are still running guest-based backups for their virtualized servers. And even with all the host-based and array-based approaches, it’s still the primary means of protection for 20% of them. Why?

I haven’t met too many folks who inherently want to deploy and manage backup agents within each virtual machine (VM), but they often feel forced to do so. For most, the reason appears to be application-specific issues. For instance, a database like SQL Server needs to be application-consistent during the backup, and it also needs to be notified when the backup is complete so it can do its log truncation and other database management tasks.

Most backup applications leverage the hypervisors’ ability to notify guest-based applications of an impending backup so it can put itself into a ready-to-be-backed-up state. We’ll use SQL Server as the example here, but this applies to many applications.

@pb

VSS primer

For many Windows Server-based applications, Microsoft provides some core “plumbing” for enabling backups called Volume Shadow Copy Service (VSS). Along with the VSS foundation within the Windows OS, there are three active components:

  1. VSS Requester includes components typically found in a traditional backup agent (often from a third party) that initiate the backup process.
  2. VSS Writer components in the application (e.g., SQL Server, Exchange or even the Windows file system) ensure the workload is ready to be backed up by performing tasks such as flushing memory-based transactions or other backup preparation.
  3. VSS Provider components in the storage layer (OS software based or hardware based) capture a snapshot of the data set to be protected.

Essentially, a VSS-based backup works in eight relatively straightforward steps:

  1. A backup agent’s VSS Requester queries VSS for workloads that are capable of being backed up.
  2. VSS enumerates any workloads that have registered their VSS Writer.
  3. The backup agent’s VSS Requester requests that the workload be prepared for backup.
  4. The workload’s VSS Writer does what’s specifically required for that workload to be backed up.
  5. After preparation, the workload’s VSS Writer notifies VSS and its VSS Provider that its data is ready.
  6. The VSS Provider snaps the data set and notifies the VSS Requester that it has the data.
  7. The VSS Requester references the (usually temporary) snapshot within the VSS Provider and sends the appropriate data to the backup server.
  8. Upon completion of the backup and acknowledgement by the backup server, VSS can notify the VSS Provider that it’s free to release the snapped data and the VSS Writer that its data is secure. The workload can then do its post-backup maintenance tasks.

@pb

VSS for VMs

That’s how it works with a physical server. The challenge is where those components are and how they interact when protecting virtualized machines with host-based backup. For that to work, two tiers of data conversations must occur: first between the backup server and the hypervisor (host), and then between the guest OSes and the host and/or backup server.

Microsoft’s Hyper-V hypervisor has its own VSS Writer for its “workload” of virtualized servers. The Hyper-V Integration Components (IC) that are automatically installed in each Windows guest OS include a VSS Requester.

So, with that in mind, let’s re-examine those eight steps:

From a host perspective:

1. and 2. The backup software’s agent runs on the Hyper-V host and recognizes Hyper-V as able to be protected because of the host’s VSS Writer.

3. The backup software makes a request to back up a particular VM.

4. The Hyper-V host’s VSS Writer does what it needs to do for its workload (a VM) to be backed up. Here, it gets the VM ready.

Here’s where it gets fun, because the way the host’s VSS Writer gets a virtual machine ready to be backed up is to tell the guest’s Hyper-V IC VSS Requester to be a backup agent, and the whole process happens inside the guest — hence the term “recursive VSS.”

Inside the guest:

1., 2. and 3. The Hyper-V IC VSS Requester discovers VSS-capable workloads for protection, such as SQL Server or Exchange, via their VSS Writers, and then instructs those workloads to be backed up.

4. The guest-based applications do what they need to do to prepare for backups (flush logs, clear caches and so on).

5. After the workloads report being ready for backup, the workloads’ VSS Writers notify the guest Windows OS VSS Provider that the data is ready.

6. The Windows OS VSS Provider snapshots the data volumes as instructed by the workloads.

7. The Hyper-V IC VSS Requester then notifies its requesting backup server (which is actually just the Hyper-V host) that the VM is in a protectable state, including an application-consistent, software-based snapshot.

Now that the guest internals are protected, its container (the logical VM) is ready to be protected. Remember, in a host-based backup model, the VM as a whole is the workload to be backed up, so like any other VSS-based workload, once it’s ready, the original backup process continues.

Now the host process continues:

5. The Hyper-V host’s VSS Writer notifies the host VSS OS and its underlying VSS Providers that the VM is ready to be snapped.

6. The VSS Provider snaps the volume the VM’s virtual hard disk (VHD) resides on.

7. The host-based backup agent that requested the backup is given access to the snap and feeds the VHD to the backup server.

@pb

That’s how a recursive VSS operation works. This doesn’t mean all VSS-enabled backup solutions are the same, even just for Hyper-V. Differentiators among Hyper-V backup solutions include:

  • Manageability and scheduling.
  • Deduplication of common objects across VMs, such as Windows OS files.
  • Integration with higher level management functions, whether it’s System Center or the Hyper-V console; it’s especially important as part of a dynamically created private cloud scenario.
  • Ability to recover individual items from within a host-based backup (with or without an agent in the VM).
  • Cleaning up the extra “junk” within the VHD during the minor gap between the time the guest VSS Provider snapped its logical file system with application-consistency and when the host VSS Provider snapped the actual VHD file.
  • Closing the loop so the guest-based application knows it’s been successfully backed up and can proceed to its post-backup management tasks.

It should also be noted this process works because it’s VSS-enabled all the way from the host’s Hyper-V VSS Writer through the VSS components inside the guests and back again. The process is somewhat different with VMware and the vSphere 5 vStorage APIs for Data Protection (VADP). Most VMware backup solutions achieve the same guest-based workflow because they leverage the guests’ VSS mechanisms, but their host-based processes differ quite a bit.

If you choose to adopt a host-based protection strategy, be sure to understand the key aspects of the host/guest interchange, as well as important features such as enabling guest-application log truncation and cleanup of the VHDs during the recursive process. That could be the difference between a successful recovery and some unusable VHDs or VMDKs.

[Originally posted on TechTarget as a recurring columnist]

Leave a Reply