At $WORK I have some very expensive Simplivity
 boxes. When you cut through all the marketing nonsense, each node is a 
combination of VMWare, HPE Intel server, SSD storage array, inline 
block deduplication and data replication. There is some pixie dust sprinkled 
on top (which doesn't work well at our site) but the the components I've
 listed here work well.
The deduplication is rather important - it gives us a compression ratio of 38:1.
However these boxes are a bit full. Rather than add more Simplivity nodes. I'm planning on building a Proxmox cluster
 and moving some of our legacy and dev systems there.  I've been running
 a POC for a couple of months and overall I'm very impressed with 
Promox.
So dedup is nice on Simplivity and works well - but can you do the same thing on Linux?
A bit of research turned up some interesting results.
BTRFS doesn't yet support inline deduplication for production usage, but it does allow for offline dedup.
animal symcbean # apt-get install dduper
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package dduper
animal symcbean # apt-get install btrfs-dedupe
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package btrfs-dedupe
animal symcbean # apt-get install bees
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package bees
There is a project called lessfs
 providing inline deduplication and is implemented as a FUSE filesystem.
 But there are things here which make me a bit uneasy. It's hosted on 
Sourceforge (so are some of my projects! it used to be a popular place 
to publish open-source). 2009-2013 saw regular updates, then they just 
seem to have stopped. Similarly activity on the help and support pages 
in Sourceforge seems to have stopped in 2013. The project website returns a 403 error.  But it seems people are still using it. Could this actually be a finished piece of software that just works? 
animal symcbean # apt-get install lessfs
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package lessfs
Also running as a FUSE filesystem is SDFS by OpenDeDup (I'm a bit confused about the product/branding too). This directly connects to cloud backend storage as well as block devices.
animal symcbean # apt-get install sdfs
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package sdfs
The other open source solution I have found is VDO.
 This runs as a kernel module rather than FUSE. But I'm struggling to 
find any references to it on any Linux other than RedHat/Fedora. Another thing I'm trying to move away from.
animal symcbean # apt-get install vdo kmod-kvdo
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package vdo
E: Unable to locate package kmod-kvdo
ZFS
 seems to be flavour of the month for large skill Linux based 
virtualization, but it likes a lot of memory for deduplication, is 
complex to configure and a LOT more complex on top of iSCSI. Although 
the infrastructure is not huge, it's big enough that we should separate 
the storage. 
For similar reasons that I am avoiding 
Docker and Kubernetes, I don't want to make my software stack too 
sophisticated. Using an SAN/NAS appliance for storage makes my life a 
lot simpler.
Currently I'm leaning towards using 
Synology for storage. In addition to the Simplivity boxes, we have some 
HP MSAs. These are really nice bits of hardware and not ridiculously 
expensive - but they do cost enough that they need to be under warranty 
and that means you need to deal with HPE's support centre. Clearly these
 guys (in India?) are sub-contracted and have targets to reduce warranty
 claims. Got a 4-hour response time on your contract? Expect your 
hardware to get fixed in four hours? Think again. At my previous gig, it
 took 3 weeks to get a replacement power supply out of them. On the last
 two big repair exercises at my current work, we were promised that 
there would be no downtime / "completely transparent". Both resulted in 
major crashes that took a long time to recover from.  I could go on all 
day with stories about their support.
But the only thing worse than their support is their software. 
Synology
 are the opposite in just about every way. Their software/user interface
 is a joy to use. But while their hardware is cheap, it is perhaps a 
little too cheap. It is cheap enough that you don't need to worry about 
expensive warranties and support contracts.
But using an appliance means more constraints than just the availability of the software. 
 
 
No comments:
Post a Comment