At $WORK I have some very expensive Simplivity
boxes. When you cut through all the marketing nonsense, each node is a
combination of VMWare, HPE Intel server, SSD storage array, inline
block deduplication and data replication. There is some pixie dust sprinkled
on top (which doesn't work well at our site) but the the components I've
listed here work well.
The deduplication is rather important - it gives us a compression ratio of 38:1.
However these boxes are a bit full. Rather than add more Simplivity nodes. I'm planning on building a Proxmox cluster
and moving some of our legacy and dev systems there. I've been running
a POC for a couple of months and overall I'm very impressed with
Promox.
So dedup is nice on Simplivity and works well - but can you do the same thing on Linux?
A bit of research turned up some interesting results.
BTRFS doesn't yet support inline deduplication for production usage, but it does allow for offline dedup.
animal symcbean # apt-get install dduper
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package dduper
animal symcbean # apt-get install btrfs-dedupe
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package btrfs-dedupe
animal symcbean # apt-get install bees
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package bees
There is a project called lessfs
providing inline deduplication and is implemented as a FUSE filesystem.
But there are things here which make me a bit uneasy. It's hosted on
Sourceforge (so are some of my projects! it used to be a popular place
to publish open-source). 2009-2013 saw regular updates, then they just
seem to have stopped. Similarly activity on the help and support pages
in Sourceforge seems to have stopped in 2013. The project website returns a 403 error. But it seems people are still using it. Could this actually be a finished piece of software that just works?
animal symcbean # apt-get install lessfs
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package lessfs
Also running as a FUSE filesystem is SDFS by OpenDeDup (I'm a bit confused about the product/branding too). This directly connects to cloud backend storage as well as block devices.
animal symcbean # apt-get install sdfs
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package sdfs
The other open source solution I have found is VDO.
This runs as a kernel module rather than FUSE. But I'm struggling to
find any references to it on any Linux other than RedHat/Fedora. Another thing I'm trying to move away from.
animal symcbean # apt-get install vdo kmod-kvdo
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package vdo
E: Unable to locate package kmod-kvdo
ZFS
seems to be flavour of the month for large skill Linux based
virtualization, but it likes a lot of memory for deduplication, is
complex to configure and a LOT more complex on top of iSCSI. Although
the infrastructure is not huge, it's big enough that we should separate
the storage.
For similar reasons that I am avoiding
Docker and Kubernetes, I don't want to make my software stack too
sophisticated. Using an SAN/NAS appliance for storage makes my life a
lot simpler.
Currently I'm leaning towards using
Synology for storage. In addition to the Simplivity boxes, we have some
HP MSAs. These are really nice bits of hardware and not ridiculously
expensive - but they do cost enough that they need to be under warranty
and that means you need to deal with HPE's support centre. Clearly these
guys (in India?) are sub-contracted and have targets to reduce warranty
claims. Got a 4-hour response time on your contract? Expect your
hardware to get fixed in four hours? Think again. At my previous gig, it
took 3 weeks to get a replacement power supply out of them. On the last
two big repair exercises at my current work, we were promised that
there would be no downtime / "completely transparent". Both resulted in
major crashes that took a long time to recover from. I could go on all
day with stories about their support.
But the only thing worse than their support is their software.
Synology
are the opposite in just about every way. Their software/user interface
is a joy to use. But while their hardware is cheap, it is perhaps a
little too cheap. It is cheap enough that you don't need to worry about
expensive warranties and support contracts.
But using an appliance means more constraints than just the availability of the software.
No comments:
Post a Comment