Friday 22 May 2020

Open source deduplication

At $WORK I have some very expensive Simplivity boxes. When you cut through all the marketing nonsense, each node is a combination of VMWare, HPE Intel server, SSD storage array, inline block deduplication and data replication. There is some pixie dust sprinkled on top (which doesn't work well at our site) but the the components I've listed here work well.

The deduplication is rather important - it gives us a compression ratio of 38:1.

However these boxes are a bit full. Rather than add more Simplivity nodes. I'm planning on building a Proxmox cluster and moving some of our legacy and dev systems there.  I've been running a POC for a couple of months and overall I'm very impressed with Promox.

So dedup is nice on Simplivity and works well - but can you do the same thing on Linux?

A bit of research turned up some interesting results.

BTRFS doesn't yet support inline deduplication for production usage, but it does allow for offline dedup.

animal symcbean # apt-get install dduper
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package dduper
animal symcbean # apt-get install btrfs-dedupe
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package btrfs-dedupe
animal symcbean # apt-get install bees
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package bees


There is a project called lessfs providing inline deduplication and is implemented as a FUSE filesystem. But there are things here which make me a bit uneasy. It's hosted on Sourceforge (so are some of my projects! it used to be a popular place to publish open-source). 2009-2013 saw regular updates, then they just seem to have stopped. Similarly activity on the help and support pages in Sourceforge seems to have stopped in 2013. The project website returns a 403 error.  But it seems people are still using it. Could this actually be a finished piece of software that just works?

animal symcbean # apt-get install lessfs
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package lessfs


Also running as a FUSE filesystem is SDFS by OpenDeDup (I'm a bit confused about the product/branding too). This directly connects to cloud backend storage as well as block devices.

animal symcbean # apt-get install sdfs
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package sdfs


The other open source solution I have found is VDO. This runs as a kernel module rather than FUSE. But I'm struggling to find any references to it on any Linux other than RedHat/Fedora. Another thing I'm trying to move away from.

animal symcbean # apt-get install vdo kmod-kvdo
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package vdo
E: Unable to locate package kmod-kvdo



ZFS seems to be flavour of the month for large skill Linux based virtualization, but it likes a lot of memory for deduplication, is complex to configure and a LOT more complex on top of iSCSI. Although the infrastructure is not huge, it's big enough that we should separate the storage.

For similar reasons that I am avoiding Docker and Kubernetes, I don't want to make my software stack too sophisticated. Using an SAN/NAS appliance for storage makes my life a lot simpler.

Currently I'm leaning towards using Synology for storage. In addition to the Simplivity boxes, we have some HP MSAs. These are really nice bits of hardware and not ridiculously expensive - but they do cost enough that they need to be under warranty and that means you need to deal with HPE's support centre. Clearly these guys (in India?) are sub-contracted and have targets to reduce warranty claims. Got a 4-hour response time on your contract? Expect your hardware to get fixed in four hours? Think again. At my previous gig, it took 3 weeks to get a replacement power supply out of them. On the last two big repair exercises at my current work, we were promised that there would be no downtime / "completely transparent". Both resulted in major crashes that took a long time to recover from.  I could go on all day with stories about their support.

But the only thing worse than their support is their software.

Synology are the opposite in just about every way. Their software/user interface is a joy to use. But while their hardware is cheap, it is perhaps a little too cheap. It is cheap enough that you don't need to worry about expensive warranties and support contracts.

But using an appliance means more constraints than just the availability of the software. 

 

Update April 2022 

Recently I've switched to PBS for backing up my Proxmox VMs and Containers. This de-duplicates the backups (unlike Simplivity here the primary image is included in the de-duplication set). Strongly recommended.

Friday 24 April 2020

COVID19 - Provisioning remote access with Linux, the details

Had a few requests asking about how all this was put together so....

Starting with a minimal Ubuntu 18.04 server install...

apt-get install openbox lightdm
apt-get install plank
apt-get install zenity pcmanfm
apt-get install lxterminal
apt-get install rdesktop
apt-get install tightvnc
apt-get install novnc


add a file in /etc/lightdm/lightdm.conf.d containing:

[SeatDefaults]

greeter-hide-users=true
greeter-show-manual-login=true


configure user 'base'
login as user "base" right click and open a terminal.
Run `plank` then ctrl-c
(this creates the openbox and plank .config). Since this should be a jump box, users access to local machine should be minimized - the default setup gives the user access to a terminal session on the local machine. Edit the openbox menu.xml file to disable this - but also set the shell to /sbin/nogin to prevent access to the local system.

I was experimenting with user home directories on different paths (so I could have some mounted noexec, some with exec) but when I did this, the users not in /home were not able to login; pam-google-authenticator reported 'Failed to compute location of secret file for "$USER"'. Checking the .so file, the path does not appear to be hard-coded - I suspect it may have been different apparmor rules in play. The solution I chose was to ensure that home directories were within /home - by mounting the extra filesystem (with noexec) there.

While you could use a conventional XDG launcher, this exposes a lot of functionality on the jump box. Using plank and the openbox menu (along with noexec & nologin) as the only means of starting programs reduces the attack surface massively.

One issue with the build I have in place at the moment is that pcmanfm will store  user paswords if asked. I have a tidy up script running from cron which removes any files in the user's home directory which are not also present in /home/base but its still something of a concern. Firefox is started in incognito mode (using the settings in the plank launcher).

COVID 19 - Conference calling

$WORK is currently using Microsoft Teams. If you've not used it before, its pretty much standard Microsoft bloatware - features over functionality. But to give them their due, after proclaiming for a very long time that it would run on all platforms, Microsoft have released a native Linux client.

Microsoft already claimed it would run in a browser but my experience was that this was only the case if the browser was MSIE/MSEdge/Chrome running on MS-Windows - certainly not Firefox or Chromium on Linux. I didn't try Apple or Android platforms but several of my colleagues seem to be using it on Apple Macs without issues. If you Google for instructions you'll (eventually) find a description of how someone made this work on Chromium/Linux with a lot of tweaks - this didn't work for me. Sorry - I would have provided a link but I didn't make a note of it and its hard to find.

I am currently running MS Teams 1.3.00.5153 on Linux Mint 18.1.

Apart from the following issues, it mostly works...
  • does not send video (receive is OK)
  • cannot show my desktop (same issue as above?)
  • sometimes it stops communicating with my microphone
  • steals focus every time a new chat arrives (by far the most annoying bug)
  • does not add an XDG start menu entry
  • does not shutdown nicely at logout
But now that most browsers natively support bi-directional audio/video capability, there's no need to run a think client for video conferencing. No need for proprietary protocols. A quick google, and I found Jitsi (Youtube video)

It's FOSS software, the client runs in a browser, and there are optional clients (I'm guessing HTML apps) for iOS and Android.

I can't say how compatible/stable this will be - but OMG! what a neat looking bit of software. It has built in recording and POTs integration. But what I really love about it is the hand icon.


Saturday 4 April 2020

Security tools are awful

In my experience, most bolt on security products actually undermine your security at great expense rather than enhance it. One exception to this is a good password manager. Recently I've been trying to find one for my workplace. Unfortunately I have nothing like the budget need for CyberArk - in my last job, I looked after my employers CyberArk installation and really loved it (despite the fact that most of it only ran on MS-Windows). If you have money to burn - read no further - go buy CyberArk and don't skimp on getting it configured correctly.

My starting point was open source team password managers - there's lots to choose from: Syspass, Teampass, Passbolt, Passit, Psono, bitwarden....the list goes on and on.

The first issue I came across is the way they handle the master encryption key. If you are running this on your own infrastructure then that might not matter too much. But few people do still run their own infrastructure, and of those that do, the passwords for your infrastructure are the last thing anyone would want to store on their own infrastructure! Almost all are really, really bad at this. A surprising number of projects try to pass off pen tests against the application as security audits - probably because 1) pen tests are now relatively cheap and 2) they know their emperor has no clothes.

The second issue is the lack of a usable API. I don't just want to store passwords, I want to install other secrets. I don't want to have to copy and paste every time my infrastructure needs a secret. I want to be able to rotate passwords. I don't even mind that your application does not do this - if I can make sense of the API I can easily implement this myself.

Most of them have APIs - but are lacking in documentation. PassBolt is offered as a commercial product / service as well as open source and proudly provides documentation on the end points - but is somewhat lacking in detail about access authentication tokens. I was therefore quite hopeful that they would be able to point me in the right direction, but after contacting their support, they were not able to provide a single example of a client or explain how their authentication worked!

I was excited when I discovered that Passit ran as a single page application - surely that must mean its a REST API? But when I tried using it I saw no data traffic in web developer - WTF? I can only guess that its using websockets to communicate.

The third issue is devops syndrome. Yes, you can install their open source product, but only after you build out the same set of orchestration and build tools that they use. Just run this simple command.....after you have installed node.js, docker, kubernetes, ansible, jenkins.....  


Wednesday 18 March 2020

COVID19 - Provisioning remote access with Linux


When I started in my current role, they were using a conventional Cisco IPSEC based vpn. While with a few config tweaks it worked, it was from ideal for security or user experience. The big security issue is that it creates a big hole in your firewall – from a device bridged to the internet! A further concern was that authentication was via a password. While I could have put in a RADIUS server with a MFA authentication source, this still required users to either:
  • take their work computer (and all the data stored on their local disk) off site
  • install and configure some very esoteric software on their own hardware

Fixing all these problems would take massive amounts of efforts to provide a very limited service with continuing security problems.

If everything they need is on their computer in work – then I just need to find a way of providing access to their computer at work remotely. So here are the ingredients for the recipe I used:

All the above, with the exception of the free certificate, are open-source and available from official Ubuntu repos (this software is also available for other Linux and BSD systems). In addition I wrote custom scripts to
  • provision users (with QR codes for Google auth)
  • run wakeonlan and rdesktop
  • collect activity stats
Now all a user needs to get connected is a mobile device running an authenticator application and an internet connected browser.
Once they get their head around the fact that they don't need to be sitting in front of the computer they are using, the users are very happy with the experience. We have fewer reports of issues than we do from the legacy VPN users. The 2-factor authentication provides much better security.

The only difficult bit was stripping out the full “desktop experience” from Openbox. I don't want my users shutting down the machine or mapping drives! Initially I tried xfreerdp as the RDP client but had a lot of issues with keyboard mapping. As hinted at above, the machine is heavily locked down – users have no shell on the loca machine. This was easy to implement but impacted on the behaviour of some terminal emulators (required for onward ssh access). Openbox and systemd don't play nice together – so running “last” reports all users have “gone away”. This seems to be yet another systemd issue. However I get more useful usage monitoring from the script to collect activity stats (this finds openbox processes and interrogates /proc to find the user, display and other information). It would be trivial to add in screen captures here – but decided to leave this out for now. Its also possible for additional users to join a VNC session, but this is currently blocked on the firewall until I think up a way of handling it which does not reduce the overall security.

The version of noVNC installed from repo is rather old, and the current client (i.e. the html and javascript parts) have a lot of improvements - I downloaded these files from github and copied them over the repo install.

I chose tigervnc as, although all the vncservers support multi-head usage on Linux, the package version of this seemed closest to my usage model.

Currently this is running on a 2 core virtual machine. The initial 2Gb of RAM was all but used up with 17 users online and this has since been changed to 8Gb. The 2CPUs is overkill – with 20 users working online, the load was around 0.3 and bandwidth was averaging 200kbps with a peak of 500kbps.

Out of curiosity I looked up what Microsoft say you need for an RDS server. A comparison with what I am currently running is shown below:



Microsoft recommendMy server uses
Base OS2Gb250Mb
RAM Per user64Mb100Mb
CPU Per user0.060.015
B/W Per user64kbps25kbps

So in terms of the hardware resources there's not a clear winner – however having worked in an environment which used Microsoft RDS extensively, supporting the Linux system is a lot cheaper in terms of manpower. And that's before considering the costs of licensing the Microsoft solution and implementing 2FA.

Some more details in a later post.