How to make Windows Deduplication go faster

While investigating Windows 2012 R2 Deduplication for the benefit of my customers, I have been testing the Windows Deduplication ingest process (Start-DedupJob) on a number of servers and desktops.

Windows Deduplication is post-process deduplication, which means you copy the raw file on to a server volume that has deduplication enabled, and then run a deduplication job to compare the contents of the file with all the files that are already on the volume.

Two different files may have some content that is the same within them, and deduplication compares the blocks of data looking for matching data. It then stores this data as a series of ‘chunks’, and replaces the data with ‘reparse points’, which are indexes to the chunks.

With full backup files, this can reduce the new disk space taken by the next backup by 96% in many cases.

The deduplication job can be run manually, or by using the task scheduler.

The first thing that concerned me about Windows Deduplication, was Microsoft’s suggestion that the maximum speed we could expect was 100 gigabytes an hour. This is 107 billion bytes in real world numbers.  This is about 30 billion bytes a second. Fortunately, I could never get the process to run this slowly, even on older servers.

For my testing, I went through lots of different tweaks of the command line trying to get every last bit of performance out of the deduplication process.

As I tested different processors, drives, and memory combinations, I found different thinks that seemed to be the bottleneck for the process.

When I first tested deduplication, even after I figured out the fastest combination of deduplication job parameters, I could see in the Task Manager Performance Monitor, that the disk drives were not heavily used, and none of the CPU cores were pegged near full usage.

My first thought was that the head movement on the drives during random access was slowing the process. So I switched to SSDs and saw a small performance boost, but the CPU was still not busy.

I scratched my head and said, let’s try a server with faster memory. The first system had 667 speed memory, so I changed to a newer server with a newer process and 1066 memory.

So I moved to a server with 1066 memory, and the process sped up quite a bit. But the CPU core was still not saturated, and the SSD wasn’t busy either.

I switched to a consumer desktop of recent vintage, a Dell 3647 I5 with 1600 speed memory.  I installed Windows Server R2 on it so it would support deduplication.

Windows deduplication sped up a lot, and for the first time, a single core was saturated. Windows Deduplication seems to be doing most of its processing on a single core.

Since random access didn’t seem to be a big factor, I switched back to hard drives from SSD so I could process larger amounts of test data.  The deduplication process seems to be combining its random access together and serializing them.

Next I got a Dell XPS desktop with an I7 at 4.0 ghz speed, also with 1600 speed memory.

This made deduplication even faster.

At this point I configured things as what I call a RackNStack server, using an Akitio rackmount 4 drive RAID array as RAID 10, connected to a desktop sitting on top of it in the rack thru (GASP) USB 3.0

I switched to a Dell Small Form Factor business-class  7020  desktop and I am continuing testing.

Along in here somewhere I got the idea to go to Control Panel / Power Options and set the server to high performance. This instantly improved performance by about 30%.   This works on both desktops and servers.   Try it on your other Windows servers and see what it does for you.  Windows is supposed to automatically increase your CPU under load, but it doesn’t work well with deduplication.

I also created what I call the Instant Dedupe Appliance: a Dell 7020 with a Western Digital 8TB or 12 TB Duo drive connected with USB 3.0. The Western Digital Duo has two drives that can be used as a RAID 1 mirror, so you get $TB or 6TB of usable deduplication space.  Some of that will be a landing zone for the raw data file before you deduplicate it.

Of course, you are welcome to run deduplication on a ‘Real’ server if you prefer.

The parameters that have worked the best for me are:

start-dedupjob –volume F: -inputoutputthrottlelevel none –priority high –preempt –type optimization –memory 80

Replace the volume letter with the volume you are deduplicating. The –priority high parameter seems to do nothing at all. For testing I went to task manager and manually increased the priority to high.

-memory 80 means use 80% of the memory for the deduplication process. This is okay on a server that is dedicated to storing and deduplicating backup files.

In deduplicating backup files, you will find that the first day’s deduplication runs the slowest.  This is most likely because much of this processing is actually the compression of the data in the deduplication chunks before storing them. In the following backup cycles, most of the data is likely to be identical, so relatively fewer unique deduplication chunks are being compressed and stored.

Even though you tell Windows Deduplication to use 80% of memory, it won’t at first, unless your server has a tiny amount of memory like 4GB.  Your second and following deduplication will use more memory.

Our deduplication test set of actual customer backup files is a little over a trillion bytes. Using an Akitio MB4 UB3 rackmount RAID enclosure with 4 Western Digital Red 4TB NAS drives, the first day’s deduplication ran at 92.7 million bytes a second, or 334 billion bytes an hour.

The second day’s deduplication ran at over 110 million bytes a second, and 400 billion bytes an hour.

Running deduplication with the WD Duo drive is a little faster than the Akitio, but it’s also half the useful storage.

Be sure to upgrade to Windows Server 2012 R2 if you are on Windows Server 2012, since the deduplication is up to twice as fast.

We combine Windows Deduplication with Replacador to do the dedupe-aware replication of the deduplicated volume over the Internet or two an external drive.

The brand name deduplication appliances will be faster and sexier than using Windows Deduplication. They may have some features that you really want, particularly if you are using deduplication for other things besides backup files.

For deduplicating backups, Windows Deduplication is great, and it generally costs about a third as much as the leading entry level deduplication appliance.

You might even consider getting two deduplication appliances for each location, and clustering them.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>