Replication for Windows 2012 R2 Deduplication with Replacador

Windows Deduplication is a free feature in Windows Server 2012 and Server 2012 R2.  It works great and I recommend it.

I’ve  also worked with other deduplication systems, including Data Domain, Avamar, ExeGrid, GreenBytes, Opensolaris and Nexenta.

Deduplication works especially well for backup files. With a deduplication system, you can store many backups on one deduplication appliance, because deduplication only stores each unique chunk of data once.

Besides the obvious advantage of taking up much less disk space for each new backup, deduplication reduces the amount of data transmission you need in order to replicate the deduplicated backup across the network or Internet to an offsite location.

The reduction in data seems magical when you first encounter it.  It actually makes it possible to replicate a backup in a reasonable amount of time with the Internet connection you already have, for most people.

The one problem with using Windows Deduplication instead of another backup appliance is that it does not have replication built into it.

We decided to do something about this, so we wrote Replacador as a replication system for Windows Deduplication.

Replacador looks at two Windows Deduplication volumes, and keeps them synchronized.  First, the source volume is optimized to turn all in policy files into deduplication chunks and reparse points. All the new or changed chunks on the source volume are copied to the destination volume, along with the reparse points. Anything that has been deleted from the source volume is deleted from the destination volume.

Finally, garbage collection is run on both source and destination volumes to keep them synchronized.

The source volume has to be a locally attached volume on the source system.  The destination volume can be a second volume on the source system, such as an external drive.  Usually it is a volume on another server that is on the network or the internet.  Both servers must be running the same version of Windows 2012.  We suggest that they should be running release R2 because the deduplication process is much faster on R2.

The volumes do not have to be the same size, but you will probably want them to be.  The second volume has to have enough room for all the chunks, the reparse points, and some extra room for garbage collection to run.

The reason we support replicating to an external drive is to make it easy to ‘seed’ a new remote deduplication volume when you first start deduplicating.  You can replicate to an external drive on the source system, carry it to the remote system, and replicate from the external drive to the destination drive one time.  This can save days or even weeks of data transmission, in some cases.

This also make it possible to replicate back from the destination volume to an external drive to quickly restore to a source system in case of catastrophic loss of the first volume, due to tornado, fire, flood, etc.

An external disk drive can also be used as a source volume for deduplication.  Some external drives support RAID data protection.  A good example of this is the Western Digital Duo series.  The Duo 8TB costs less than $350 and provides 4 TB of protected storage.  There is also a 12 TB Duo with 6Tb of available storage.

There is a beta test version of Replicator available.  You will also need an authorization code to use Replacador.

Here is the PDF of the documentation.

Replacador Configuration and Use

To get an activation code, browse to c:\LaserVault\Replacador and click the Replacador Configuration application.

Click the Authorization button on the lower left.

replacador1

Replacador generates a unique serial number for your server.  Copy the contents of the Serial Number and paste it into an email and send it to ReplacadorCode@laservault.com.

We will send you a code good for 30 days.

Replacador2

 

When you receive the code, paste it into the Authorization Key and click OK.

Next, setup and run Replacador 

The hidden value of deduplication

The term deduplication is generally used to refer to block-level hash-based processing of multiple data files to shrink them to the smallest possible representation on disk.

A unique cryptographic hash, such as the 20 byte long SHA-1, is calculated for each unique block of data. The block of data is then compressed and stored with a hash index, and a pointer to the index takes the place of the raw data.

The original data file is replaced by a set of ‘reparse points’ that are the indexes of the ‘chunk storage’ that contains the compressed, hashed blocks.

When you want to read the file again, Windows transparently reassembles the original blocks using the reparse points and the chunk storage.

Deduplication can be used with live data, but where it really shines is in the storage of backup files.

For example, if you take an uncompressed, raw backup file of 600 GB and deduplicate it, it might take 200 GB on disk, with most of the savings on the first day coming from the compression of the data.

When you take the next day’s backup of 600 GB, deduplication will replace most of the data with pointers to existing dedupe chunks, and the total new storage used might be 15 GB.

So a deduplicated volume that is 8 TB in size, might hold 200 TB of raw backups, or about 330  600 GB backups, at 25 to 1 deduplication rate.

Deduplication makes it feasible to keep many backup cycles online at your fingertips.

This is one of two reasons why companies have spent billions of dollars on deduplication appliances.

The second reason is the hidden benefit of deduplication.

Deduplication reduces the size of each new backup cycle’s data footprint to a fraction of the amount taken by a complete backup.

If a 600 GB backup is reduced to 30 GB of reparse points and new chunk storage, it suddenly becomes reasonable to replicate that data across the internet to a second deduplication volume. Or even a third, or a fourth.

Replication makes it possible to move just the changes to the offsite disaster recovery copy.

Windows Deduplication does not include replication, but there is a third-party solution called Replacador that runs the Windows Deduplication job and then replicates the changes across the Internet, a local area network, or to an external drive.

With the addition of replication, Windows Deduplication can offer the SMB a backup deduplication solution at a much lower price than a dedicated appliance.