Now’s the time to virtualize backup deduplication

With so much great work being done in converged infrastructure and hyper-converged appliances, many users are getting to the point that the only boxes with blinky lights that they want to see on their floor are those converged infrastructure (CI)/hyper-converged appliance (HC) platforms, since everything else is running inside those bundles.

But one of the last “blinky” holdouts is the data protection setup being used to protect the CI/HC platforms. Even when the data protection software engine might run within a virtualized server, the protection storage (typically a storage array performing backup deduplication) is often still physical and adjacent to the rest of the infrastructure.

My recent hands-on experience with a new virtualized backup deduplication array has validated my beliefs that those platforms should also be virtualized. In the past, many argued that key systems, such as database platforms, needed dedicated hardware due to imbued latency going through the virtualization layer or because of the demand for so many CPU/memory/I/O resources, the machine would be essentially dedicated if it hosted just one VM.

But the innovations that have thinned the hypervisor layer and optimized I/O also negate the first argument, while the second argument of being dedicated actually favors the single-VM-on-host model. VMs are easier to protect and recover than physical servers, and they are much easier to upgrade:

* If you have a dedicated physical server that requires more resources, you must rebuild or migrate that OS, application and dataset.

* If your virtual machine requires more resources, you can add them from the host or move the VM to a new host that has more of what’s needed.

If those are the scenarios for a dedicated database server, why can’t they be the same for a dedicated backup deduplication platform? In fact, that argument should be even more compelling to those who use dedicatedbackup deduplication appliances, since the task of replacing the controller head of a dedupe platform has traditionally been arduous.

According to ESG research on data protection appliances, most organizations believe the tipping point between a virtual appliance and physical appliance to be around 32 TB, but if VMs continue to scale larger, that becomes less of a barrier. Moreover, some of the most exciting innovations in backup deduplication are coming from products with controllers that interconnect for both scale-out and scale-up. At that point, why not put a few virtual dedupe appliances in the same infrastructure, split across I/O boundaries, such as one dedupe-VM in each of the four nodes within a hyper-converged appliance?

The reasons for virtualizing backup deduplication go far beyond “because you can without penalty”:

* Distributed environments can run virtualized dedupe within each remote office, and then take advantage of highly efficient, compressed/deduplicated replication from the branch to a larger deduplication appliance at headquarters.

* Small and midsized organizations can finally get the economics of deduplication, without potentially complicating their otherwise consolidated environments. And they can efficiently replicate from their own SMB environment to another of their own sites or to a service provider offering cloud storage or disaster recovery as a service.

* Service providers can choose to spin up a virtual appliance per subscriber, instead of relying on multi-tenancy or making significant Capex investmentsup front. Instead, they can create virtual dedupe targets on demand, with complete isolation of data and management, and then add capacity licensing (and do migration-based maintenance) transparently.

* Dedupe vendors benefit from virtual dedupe, since it’s much easier for pre-sales team members and reseller partners to spin up a VM than it is to requisition a physical demo unit for a proof of concept.

* Backup software vendors can also benefit, if they choose to ship a virtual backup engine and perhaps partner with a virtual dedupe vendor. The customer/partner installs two VMs that are known to work together, and everyone benefits.

For the vast majority of environments, disk is the right choice to recover from — and from a cost-efficiency perspective, you really have to have backup deduplication. That said, the argument that you need a physical deduplication platform is being challenged as innovations in virtual products continue to exceed expectations.

 [Originally posted on TechTarget as a recurring columnist]

Leave a Reply