Roll Your Own Deduplication Appliance with Windows Server 2012 R2

We have been doing a lot of testing and implementation of Windows Deduplication and in the process we have come up with a basic roll-your-own dedupe appliance using Windows Server 2012 R2.

After testing Windows Deduplication on various hardware, we have come to use a simple business-class desktop with an external RAID array as our basic deduplication workhorse. Deduplication wants a fast single-core speed and fast memory, and the cheapest way, by far, is to fulfill these requirements with a desktop system.

A good place to begin is with a Dell 7020 or 9020 Small Form Factor computer with an Intel I7 processor. As of March 2015, a system with 16GB of 1600 memory is about $800. For storage, a USB 3.0 Western Digital Duo drive in either 8TB  $(340) or 12TB ($650) size can be a good choice. The Duo drive is actually 2 drives in an external enclosure. You can set the drives up with internal RAID 1 mirroring, so you get 4TB or 6 TB of usable space.

Plug these together and install Windows Server 2012 R2 on the Dell and you have a deduplication system. Some people like HP instead of Dell. Some brave hearts swear by SuperMicro.

If you need more storage or you want to put your deduplication appliance in a rack, you can use a rackmount RAID enclosure like the USB 3.0 Akitio MD4 U3B ($350) with 4 3.5 inch drives. Set it up as RAID 10 for both speed and protection, using 4TB, 6TB, or 8TB drives. Put it in the rack, and flip the 7020 sideways and put it on top.

At this time, 4 WD RED NAS drives are about $600. So for $1800 and the price of Windows Server 2012 R2 (which can be anywhere up to about $700 retail) you have a 8TB available deduplication appliance.

Since Windows Deduplication is post process, you will need a certain amount of that storage for the raw files before you deduplicate them. Since we are using these systems for deduplicating and replicating backups to our DR site, we need room for at least one day’s full backup plus 50% ‘fudge factor’ (this is a professional term of art from the 1960s, they may call this something else now.) Our daily full backup is a bit over 1 TB, so 8 TB – 1.5 TB is 6.5 TB of deduplication space. At 25 to 1, that is over 150 full backups.

This can be a useful low end SMB system, a Proof of Concept (POC) system, or a departmental system for an enterprise.

Many of our customers want ‘real’ servers and will install hardware that costs 2 to 4 times this much. That is okay too.

The WD DUO 4TB available system costs $350 for the WD DUO 8TB, $800 for a Dell 7020, and the cost of Windows Server 2012 R2.

Real deduplication for $1150 plus Windows.

By the way, I’ve clocked my 4TB (8TB) Duo system deduplication at over 400 billion bytes an hour on full backups after the first day (because the first day is mostly compression).  The Akitio based system is a little bit slower, but still respectable speed for what we are doing.

This is a roll your own price.  When I sell similar systems they cost a lot more, because all my hippies quit and now I have to pay my employees.

You can’t do everything with these that you can with the big name deduplication appliances.  They won’t scale as high – Windows deduplication doesn’t work on a physical disk volume over 64TB in size, for example.

The big guys are claiming ever more dizzying ingest rates for their systems as well. Microsoft claims their R2 version of deduplication tops out about 40 MB a second, but we generally see speeds two to three times that fast. 350 billion bytes an hour to 450 billion bytes an hour is typical with the systems we test. We expect this to continue to increase as processors and memory get faster.

If your full backups are up to about 3 TB a day, one of these systems may work for you. If you are doing periodic full backups and incremental backups the rest of the time, that number could be higher.

With the low cost of these systems, you can divide up the work and have two, four, or even eight deduplication appliances for different backups.

The other thing Microsoft Deduplication is missing is replication, but we have solved that problem. We will be releasing our own replication system, Replacador, very soon.  

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>