Replicating a Veeam Backup Repository with Windows Deduplication

I have already discussed the advantages of using Windows Deduplication for Veeam Backup Repositories, and introduced the idea of using Replacador to replicate a windows deduplication volume.

Now we will set up Replacador on a windows deduplication volume that holds a Veeam Backup Repository and then run the replication process.

First browse to C:\LaserVault\Replacador and execute the ReplacadorConfig application.

We will use this to define a replication task for our source and destination volumes.

In this first example, we will replicate the volume to another volume on an external drive on the same server.  The same process also runs across a network or the Internet.  We will give an example of that later.

In the Replication Configuration Screen, press the Add Definition button and define a replication task.

In this case, we make the task name vmbackup, the machine name is ‘.’  ( a period means the current machine).

The volume Path is D:\  Normally a deduplication volume will be a drive letter on the current machine.

replacador4

Don’t click OK yet.

Each replication needs at least one Destination. You can actually replicate to multiple destinations at once.

Click the Add Destination button and a new form opens to define the destination.

In this case we are replicating to another volume on the same server, so the Machine Name is ‘.’ for the local server again.

In the case of a different network location, this could be the VNC name of the target machine, or its IP address.

The username is the local username on the target machine plus a password.  In this case we are using administrator, but you could use the system account or whatever is appropriate.  The user needs to have sufficient authority to run Windows Deduplication garbage collection on the target machine.

The volume path is the local pathname on the target machine to the deduplication volume that will be a clone of the source deduplication volume.

The UNC path is the UNC version of this target deduplication volume, consisting of \\machinename\\volume

In the case where the target is on this local machine, just use the volume path again.

replacador5

Now click OK, then OK, then OK, and your replication is configured.

The Replacador Manual explains how to run the Replacador Transfer program from the command line or thru the task scheduler, but since we are just testing, we will just click to execute the ReplacadorTransfer application.  Since there is only one replication task defined and the default action of the application is to replicate, it will do exactly what we want.

When you first start the Replacador Transfer, it looks like nothing is happening.  Actually Replacador transfer starts a deduplication job on the source volume to make sure that everything is ready for replication.  If you have already deduplicated the volume, this part of the task will just take a minute or two.

Once the source volume is deduplicated, a command window will open and display the replication progress.

replacador7

You can get a better idea what is going one by looking at the Task Manager performance screen.

replacador8

Replication is really just a specialized copy task that takes very few CPU or memory resources.  The limiting factor is the speed of reading, transmitting, and writing the data on the target volume.

The whole point to replication is to reduce the data traffic to the minimum needed to move the changes from the source volume to the destination volume.  The first replication will be a large one, which is why it is sometimes a good idea to replicate to an external drive to seed the actual target server volume.  After that, the replication process should be a small fraction of the original deduplication volume content, even for a new full backup.

When Replacador is done, you will have an exact copy of the deduplicated volume on the target server.  Each time you replicate in the future, only the changed chunks and reparse points on the deduplicated volume will be sent to the target volume.  The original files will not be reflated at any time in the process.

 

Veeam Backup Repository Settings for Windows Deduplication 2012 R2

To use Windows Deduplication with Veeam Enterprise Plus, you will most likely want to use a real Windows Server and create a deduplication-enabled volume for your Veeam backups.  Your Veeam backups will be stored in a Veeam Backup Repository, which is a folder holding all the files.

Windows deduplication ingestion is a CPU and memory intensive procedure and it is probably best not to run it in a VM.

For the same reason, it is best to run windows deduplication on a server that is not being used in production.

On another blog post, we show you how to roll your own deduplication appliance.

You can have multiple Veeam backup repositories on the same deduplicated volume.  For example, if you have three different Hyper-V servers, each with its own collection of VMs, you could have three repositories on one deduplicated volume.

You can also have multiple volumes with deduplication enabled on the same Windows server.  You might want to do this because the different Hyper-V or VMWare hosts have too many VMs for the volume you are using, or because the VMs are very different from each other and won’t deduplicate as well as if you organize them on separate volumes, with all the Linux MySQL VMs on one volume for example.

In this example, we are working with two different servers.  Server H is the host, which is running Windows 2012 R2 with Hyper-V and is hosting multiple VMs.  It has Veeam Enterprise Plus installed on it for backup.

Server R is the Repository server.  This is where Veeam will remotely install the Veeam Backup Repository agent and NFS.

You will be doing all your typing and viewing on Server H, while Veeam will install its software across the network on Server R.

Within Veeam Enterprise Plus, click on Backup Repositories, then right click and add new repository.

Click on Microsoft Windows Server, then Next.

2015-03-13 15_19_55-Edit Backup Repository

Put in the ip address or network name of the server.

At this point Veeam will ask you for the Username and Password to use on the repository server.

veeam3

Browse to or create the folder name for the backup repository.

veeam4

Set the Storage Compatibility Settings for deduplication.

2015-03-13 15_20_08-Storage Compatibility Settings

These are the best settings for a backup repository that will be on a volume with Windows Deduplication enabled.

The benchmark shown on this blog was run with these settings.

Veeam will ask to install its own NFS. OK this with these settings.

veeam6

Veeam will do some things to install the repository software and NFS on the replication server R.

veeam7

Now go back to the backup job you have set up, or create a new job, and point it to your new repository.

Run the backup job. When it is complete, go to the repository server and run Windows deduplication, or use Replacador to do this.

The first deduplication will not run as fast as your second and following deduplications.

Veeam Backup with Windows Deduplication Benchmark

Some people wonder why anyone would use Windows Deduplication on a Veeam Backup Repository.   Doesn’t Veeam have its own deduplication and replication?

The people at Veeam actually recommend using Windows Deduplication and have a great writeup about it that you can download here.

Veeam has great deduplication and replication, but it is within a single backup.  Veeam deduplicates the backup of one VM against another VM in the same backup.

Windows Deduplication deduplicates blocks of data across many backups.  For example, when I deduplicated a full Veeam backup of multiple VMs a second time, 800 GB of backup deduplicated down to 10 GB.  That means that using Windows Deduplication on the Veeam Repository reduces my full replication over the Internet from 800 GB to 10 GB, on the second and following full backups.

Of course most people using Veeam are going to do full backups periodically, but incremental backups one or more times each day.

My first incremental backup with Veeam takes about 20 GB of repository storage.  Deduplicating that with Windows Deduplication takes it down to about 2 Gb of disk usage.  This means I can protect my 800 GB of VMs using 2 GB of actual disk storage and replicate it in just a few minutes.

My Veeam repository testbed is actually on a Dell business desktop for price and performance reasons I explain elsewhere on this blog.

Del 7020 i7 4790 3.4ghz  16GB 1600 speed DDR3 Small Form Factor SFF  (price about $850 12/2013)

Probox 4x USB 3.0 4 drive enclosure ($99)

with 4 WD RED NAS 4TB drives (about $600 3/2015)

Windows Server 2012 R2 (your price will vary up to $800)

The Veeam backup time, inlcuding sending it across 1 gigabit network, was about an hour and a half for the first full backup.

Windows deduplication  of 836,809,182,740 bytes
Elapsed time is 8963 seconds
93,362,622 bytes per second
336,105,439,904 bytes per hour

The first time you run windows deduplication on a backup file, much of the time is used in the compression of the chunks.  Therefore the deduplication time is likely to be longer than your daily deduplication of second and following backups.

Windows reports the dedupe status on the volume containing the Veeam Backup Repository.

Volume : D:
Capacity : 7.27 TB
FreeSpace : 6.98 TB
UsedSpace : 304.06 GB
UnoptimizedSize : 785.56 GB
SavedSpace : 481.5 GB
SavingsRate : 61 %
OptimizedFilesCount : 2
OptimizedFilesSize : 779.34 GB
OptimizedFilesSavingsRate : 61 %

My second Veeam Backup is a forward incremental backup.  This is what Veeam suggests you use when storing backups on a windows deduplicated volume.

The entire backup ran in 6 minutes and 47 seconds.

26,859,267,777 bytes
261 seconds
102,909,071 bytes per second

Windows backup is ingesting at 370,472,655,600 bytes per hour.  Microsoft says it will only go 100 GB an hour.  I did juice it up a bit by running the process in high priority.

But wait, there’s more! (As they say on TV).  The next full backup will be faster.

Meanwhile, let’s look at the volume usage.

Volume : D:
Capacity : 7.27 TB
FreeSpace : 6.97 TB
UsedSpace : 305.54 GB
UnoptimizedSize : 810.63 GB
SavedSpace : 505.09 GB
SavingsRate : 62 %
OptimizedFilesCount : 3
OptimizedFilesSize : 804.35 GB

That’s nice – 26 billion bytes of backup in less than 2 GB of disk space.

I ran the incremental again and the size transferred was a lot smaller.

Volume: D:

Job processed space (bytes): 3,837,792,131
Job elapsed time (seconds): 81
Job throughput (MB/second): 45.18

The throughput doesn’t look as good, but it takes time to just start and stop the program, and the whole run was less than a minute and a half.

Volume : D:

Capacity : 7.27 TB
FreeSpace : 6.97 TB
UsedSpace : 306.17 GB
UnoptimizedSize : 814.22 GB
SavedSpace : 508.06 GB
SavingsRate : 62 %

My real concern in doing all this is how long it will take to replicate Veeam backups over the Internet.  I don’t want to be schlepping tapes all over the place.

I have the impression that a lot of Veeam users are only doing full backups once a month or even less.  But I am an old school kind of guy and I really want to be able to do a full backup once a week.  With Veeam deduplication that would still be a lot of data, I think.  But what about with Windows deduplication?

This time I chose ‘Active Full’ backup on Veeam.

Once again the backup to the repository took about an hour and a half.

But look at the windows deduplication processing:

837,381,046,512 bytes processed

5169 seconds
162,000,589 bytes per second
583,202,121,773 bytes per hour
837,381,046,512 bytes of backup used 10 GB of new disk space

Volume : D:

Capacity : 7.27 TB
FreeSpace : 6.96 TB
UsedSpace : 316.07 GB
UnoptimizedSize : 1.56 TB
SavedSpace : 1.25 TB

583 billion bytes an hour! 

While the overall deduplication rate of the volume is not that high yet, the new full backup used up 10 GB for almost 800 GB of new data.  If this rate holds, I should be able to store about 500 full Veeam backups on this volume.

Of course, my original plan was to store incremental forwards with a full backup once a week.  I will probably still do that, but I could also just do full backups for a long long time.

(Shameless product plug) What is going to make this nice for me is using Replacador to replicate the windows deduplication volume offsite and to the cloud.  The windows deduplication cuts the backup size down so much that I can replicate a days backups in under ten minutes and a full backup in an hour or so.

How does this scale?  If you are doing anything up to about 10 TB of Veeam full backup once a week, with incrementals the rest of the time, you could process it with this system.  You would want to put in 8 TB hard drives, I expect.  I’m running these RAID 10 through Windows file services on USB 3.0.

Of course a “real” server would have faster hard drives, and if you use faster memory and a sufficiently fast processor you might go even quicker than this.   We will be testing windows deduplication on a new Dell 530 with 2133 memory soon and hope to bring you even better numbers.

According to our testing, the single core speed and the memory speed are the most important factors in the windows deduplication ingestion.

Windows deduplication can add a lot of value to the Veeam backup process.  It allows you to store more backups in less space, and replicate them in far less time, than with Veeam alone.