Veeam Backup with Windows Deduplication Benchmark

Some people wonder why anyone would use Windows Deduplication on a Veeam Backup Repository.   Doesn’t Veeam have its own deduplication and replication?

The people at Veeam actually recommend using Windows Deduplication and have a great writeup about it that you can download here.

Veeam has great deduplication and replication, but it is within a single backup.  Veeam deduplicates the backup of one VM against another VM in the same backup.

Windows Deduplication deduplicates blocks of data across many backups.  For example, when I deduplicated a full Veeam backup of multiple VMs a second time, 800 GB of backup deduplicated down to 10 GB.  That means that using Windows Deduplication on the Veeam Repository reduces my full replication over the Internet from 800 GB to 10 GB, on the second and following full backups.

Of course most people using Veeam are going to do full backups periodically, but incremental backups one or more times each day.

My first incremental backup with Veeam takes about 20 GB of repository storage.  Deduplicating that with Windows Deduplication takes it down to about 2 Gb of disk usage.  This means I can protect my 800 GB of VMs using 2 GB of actual disk storage and replicate it in just a few minutes.

My Veeam repository testbed is actually on a Dell business desktop for price and performance reasons I explain elsewhere on this blog.

Del 7020 i7 4790 3.4ghz  16GB 1600 speed DDR3 Small Form Factor SFF  (price about $850 12/2013)

Probox 4x USB 3.0 4 drive enclosure ($99)

with 4 WD RED NAS 4TB drives (about $600 3/2015)

Windows Server 2012 R2 (your price will vary up to $800)

The Veeam backup time, inlcuding sending it across 1 gigabit network, was about an hour and a half for the first full backup.

Windows deduplication  of 836,809,182,740 bytes
Elapsed time is 8963 seconds
93,362,622 bytes per second
336,105,439,904 bytes per hour

The first time you run windows deduplication on a backup file, much of the time is used in the compression of the chunks.  Therefore the deduplication time is likely to be longer than your daily deduplication of second and following backups.

Windows reports the dedupe status on the volume containing the Veeam Backup Repository.

Volume : D:
Capacity : 7.27 TB
FreeSpace : 6.98 TB
UsedSpace : 304.06 GB
UnoptimizedSize : 785.56 GB
SavedSpace : 481.5 GB
SavingsRate : 61 %
OptimizedFilesCount : 2
OptimizedFilesSize : 779.34 GB
OptimizedFilesSavingsRate : 61 %

My second Veeam Backup is a forward incremental backup.  This is what Veeam suggests you use when storing backups on a windows deduplicated volume.

The entire backup ran in 6 minutes and 47 seconds.

26,859,267,777 bytes
261 seconds
102,909,071 bytes per second

Windows backup is ingesting at 370,472,655,600 bytes per hour.  Microsoft says it will only go 100 GB an hour.  I did juice it up a bit by running the process in high priority.

But wait, there’s more! (As they say on TV).  The next full backup will be faster.

Meanwhile, let’s look at the volume usage.

Volume : D:
Capacity : 7.27 TB
FreeSpace : 6.97 TB
UsedSpace : 305.54 GB
UnoptimizedSize : 810.63 GB
SavedSpace : 505.09 GB
SavingsRate : 62 %
OptimizedFilesCount : 3
OptimizedFilesSize : 804.35 GB

That’s nice – 26 billion bytes of backup in less than 2 GB of disk space.

I ran the incremental again and the size transferred was a lot smaller.

Volume: D:

Job processed space (bytes): 3,837,792,131
Job elapsed time (seconds): 81
Job throughput (MB/second): 45.18

The throughput doesn’t look as good, but it takes time to just start and stop the program, and the whole run was less than a minute and a half.

Volume : D:

Capacity : 7.27 TB
FreeSpace : 6.97 TB
UsedSpace : 306.17 GB
UnoptimizedSize : 814.22 GB
SavedSpace : 508.06 GB
SavingsRate : 62 %

My real concern in doing all this is how long it will take to replicate Veeam backups over the Internet.  I don’t want to be schlepping tapes all over the place.

I have the impression that a lot of Veeam users are only doing full backups once a month or even less.  But I am an old school kind of guy and I really want to be able to do a full backup once a week.  With Veeam deduplication that would still be a lot of data, I think.  But what about with Windows deduplication?

This time I chose ‘Active Full’ backup on Veeam.

Once again the backup to the repository took about an hour and a half.

But look at the windows deduplication processing:

837,381,046,512 bytes processed

5169 seconds
162,000,589 bytes per second
583,202,121,773 bytes per hour
837,381,046,512 bytes of backup used 10 GB of new disk space

Volume : D:

Capacity : 7.27 TB
FreeSpace : 6.96 TB
UsedSpace : 316.07 GB
UnoptimizedSize : 1.56 TB
SavedSpace : 1.25 TB

583 billion bytes an hour! 

While the overall deduplication rate of the volume is not that high yet, the new full backup used up 10 GB for almost 800 GB of new data.  If this rate holds, I should be able to store about 500 full Veeam backups on this volume.

Of course, my original plan was to store incremental forwards with a full backup once a week.  I will probably still do that, but I could also just do full backups for a long long time.

(Shameless product plug) What is going to make this nice for me is using Replacador to replicate the windows deduplication volume offsite and to the cloud.  The windows deduplication cuts the backup size down so much that I can replicate a days backups in under ten minutes and a full backup in an hour or so.

How does this scale?  If you are doing anything up to about 10 TB of Veeam full backup once a week, with incrementals the rest of the time, you could process it with this system.  You would want to put in 8 TB hard drives, I expect.  I’m running these RAID 10 through Windows file services on USB 3.0.

Of course a “real” server would have faster hard drives, and if you use faster memory and a sufficiently fast processor you might go even quicker than this.   We will be testing windows deduplication on a new Dell 530 with 2133 memory soon and hope to bring you even better numbers.

According to our testing, the single core speed and the memory speed are the most important factors in the windows deduplication ingestion.

Windows deduplication can add a lot of value to the Veeam backup process.  It allows you to store more backups in less space, and replicate them in far less time, than with Veeam alone.

 

How to make Windows Deduplication go faster

While investigating Windows 2012 R2 Deduplication for the benefit of my customers, I have been testing the Windows Deduplication ingest process (Start-DedupJob) on a number of servers and desktops.

Windows Deduplication is post-process deduplication, which means you copy the raw file on to a server volume that has deduplication enabled, and then run a deduplication job to compare the contents of the file with all the files that are already on the volume.

Two different files may have some content that is the same within them, and deduplication compares the blocks of data looking for matching data. It then stores this data as a series of ‘chunks’, and replaces the data with ‘reparse points’, which are indexes to the chunks.

With full backup files, this can reduce the new disk space taken by the next backup by 96% in many cases.

The deduplication job can be run manually, or by using the task scheduler.

The first thing that concerned me about Windows Deduplication, was Microsoft’s suggestion that the maximum speed we could expect was 100 gigabytes an hour. This is 107 billion bytes in real world numbers.  This is about 30 billion bytes a second. Fortunately, I could never get the process to run this slowly, even on older servers.

For my testing, I went through lots of different tweaks of the command line trying to get every last bit of performance out of the deduplication process.

As I tested different processors, drives, and memory combinations, I found different thinks that seemed to be the bottleneck for the process.

When I first tested deduplication, even after I figured out the fastest combination of deduplication job parameters, I could see in the Task Manager Performance Monitor, that the disk drives were not heavily used, and none of the CPU cores were pegged near full usage.

My first thought was that the head movement on the drives during random access was slowing the process. So I switched to SSDs and saw a small performance boost, but the CPU was still not busy.

I scratched my head and said, let’s try a server with faster memory. The first system had 667 speed memory, so I changed to a newer server with a newer process and 1066 memory.

So I moved to a server with 1066 memory, and the process sped up quite a bit. But the CPU core was still not saturated, and the SSD wasn’t busy either.

I switched to a consumer desktop of recent vintage, a Dell 3647 I5 with 1600 speed memory.  I installed Windows Server R2 on it so it would support deduplication.

Windows deduplication sped up a lot, and for the first time, a single core was saturated. Windows Deduplication seems to be doing most of its processing on a single core.

Since random access didn’t seem to be a big factor, I switched back to hard drives from SSD so I could process larger amounts of test data.  The deduplication process seems to be combining its random access together and serializing them.

Next I got a Dell XPS desktop with an I7 at 4.0 ghz speed, also with 1600 speed memory.

This made deduplication even faster.

At this point I configured things as what I call a RackNStack server, using an Akitio rackmount 4 drive RAID array as RAID 10, connected to a desktop sitting on top of it in the rack thru (GASP) USB 3.0

I switched to a Dell Small Form Factor business-class  7020  desktop and I am continuing testing.

Along in here somewhere I got the idea to go to Control Panel / Power Options and set the server to high performance. This instantly improved performance by about 30%.   This works on both desktops and servers.   Try it on your other Windows servers and see what it does for you.  Windows is supposed to automatically increase your CPU under load, but it doesn’t work well with deduplication.

I also created what I call the Instant Dedupe Appliance: a Dell 7020 with a Western Digital 8TB or 12 TB Duo drive connected with USB 3.0. The Western Digital Duo has two drives that can be used as a RAID 1 mirror, so you get $TB or 6TB of usable deduplication space.  Some of that will be a landing zone for the raw data file before you deduplicate it.

Of course, you are welcome to run deduplication on a ‘Real’ server if you prefer.

The parameters that have worked the best for me are:

start-dedupjob –volume F: -inputoutputthrottlelevel none –priority high –preempt –type optimization –memory 80

Replace the volume letter with the volume you are deduplicating. The –priority high parameter seems to do nothing at all. For testing I went to task manager and manually increased the priority to high.

-memory 80 means use 80% of the memory for the deduplication process. This is okay on a server that is dedicated to storing and deduplicating backup files.

In deduplicating backup files, you will find that the first day’s deduplication runs the slowest.  This is most likely because much of this processing is actually the compression of the data in the deduplication chunks before storing them. In the following backup cycles, most of the data is likely to be identical, so relatively fewer unique deduplication chunks are being compressed and stored.

Even though you tell Windows Deduplication to use 80% of memory, it won’t at first, unless your server has a tiny amount of memory like 4GB.  Your second and following deduplication will use more memory.

Our deduplication test set of actual customer backup files is a little over a trillion bytes. Using an Akitio MB4 UB3 rackmount RAID enclosure with 4 Western Digital Red 4TB NAS drives, the first day’s deduplication ran at 92.7 million bytes a second, or 334 billion bytes an hour.

The second day’s deduplication ran at over 110 million bytes a second, and 400 billion bytes an hour.

Running deduplication with the WD Duo drive is a little faster than the Akitio, but it’s also half the useful storage.

Be sure to upgrade to Windows Server 2012 R2 if you are on Windows Server 2012, since the deduplication is up to twice as fast.

We combine Windows Deduplication with Replacador to do the dedupe-aware replication of the deduplicated volume over the Internet or two an external drive.

The brand name deduplication appliances will be faster and sexier than using Windows Deduplication. They may have some features that you really want, particularly if you are using deduplication for other things besides backup files.

For deduplicating backups, Windows Deduplication is great, and it generally costs about a third as much as the leading entry level deduplication appliance.

You might even consider getting two deduplication appliances for each location, and clustering them.