Argh!: June 2022

As I've written elsewhere, when I started my current gig in 2019, the infrastructure and IT Operations turned out to be very different than what I had been told when I handed in my notice to my previous employer. The description I keep going back to is that it was a fractal horror story.

The majority of the systems were Centos, along with some Redhat boxes (pre-dating the "Enterprise" moniker) and a few Whitebox Linux machines. I'd ever heard of Whitebox before I arrived here. The Whitebox Linux distribution was a free version of Redhat Linux (again without the "Enterprise"). These dated from the last millenium.

Upgrading the systems in place was going to be a lot more work than replacing them. In the majority of cases they were so old that the repositories were no longer online. Further, the key server components were all built from tarballs. And every host had a different version of the base operating system, and different builds/file layouts. As a result there was benefit to sticking with Redhat/Centos. By reverting to repo based software installations (wherever possible) we would be deploying updates from a single, trusted channel.

So would it better to replace this with something else? As I was deliberating this, IBM bought over Redhat. And while there was nothing to suggest that the future for Centos might be any different this was not a good time to be commiting to Centos as the strategic platform going forward. Further, I've seen the impact of trying to run Linux with the SELinux targeted policy in an Enterprise environment.

I did consider the possibility of migrating to a docker (or similar) infrastructure. But the disadvantages and risks from this massively outweighed any benefits.

My initial thought was to go for a purely rolling release model. The systems in place had not been upgraded because management
- was terrified of breaking stuff
- did not have enough people/right skills to implement upgrade cycles
While the level of risk of breaking things with a rolling release compared to a staged release was probably the same, the rolling release model would mean that pain would be more spread out. However there are no large scale rolling release distros geared towards enterprise environment. Is rolling-release in the enterprise simply an oxymoron?

While I had previously run a datacentre primarily on Suse, that was a long time ago. Yast is still a fantastic toolkit but Suse seems to have become less relevant in an enterprise server role. Debian stable would have been a good fit - with the advantage of a huge range of software available from official repositories, however I didn't want to move to a platform with even less frequent upgrade cycles that RHEL.

This left Ubuntu as the next obvious choice.

In addition to upgrading the hosts, I also wanted to move the infrastructure from a 1990's dial up ISP to something more akin to a modern, integrated environment; grouping functionality by its technical role with structured dependencies instead of building the same wheels over and over again. A priority was building a proper DMZ to sit between the applications and the internet. This meant I could:

avoid exposing the ancient machines directly on the internet
centralize/automate certificate managementupgrade all the sites to HTTP/2 without having to replace the platforms
implement WAF-like security controls
implement useful analytics

So after some initial testing, I built out a cluster of reverse proxies using Ubuntu and nginx.

The impact was huge.

Changing to HTTP/2 resulted in page load speeds roughly doubling on every service.

Letting the lightweight proxy handle the long haul communications freed up processes (and therefore memory and CPU) on the origin servers resulting in a massive increase in capacity. Previously the moitoring would light up like a christmas tree as load ramped up every morning, swap files filled up and response times plummeted. This almost completely eliminated the issues.

I expect I would have seen these same benefits regardless which Operating System/distribution I had chosen, but Ubuntu has proved to be fast, reliable and very low effort.

As the modernization program has progressed, every investment in re-platforming/upgrading has paid back multi-fold. I've only run into two issues which couldn't be solved on the path I'd planned.

The first was Solr. The version in the Ubuntu repos (inherited from Debian) is old and very badly organized. For this the tarball package proved to be a mch better choice.

The second was FreeIPA. As part of the modernization we needed to replace the old OpenLDAP installation. Moving to FreeIPA provided an integrated solution which could easily support additional features (notably sudo). However the version available from repo at that time was not very current / stable. After trying various options I went with Alma Linux on the hosts for the FreeIPA service.

I guess I'm digressing from where I started to talking about architecture.

While these changes have demonstrated their value, the choices here were, in many cases, the exact opposite of the obvious solution:

- Solved a work overload by taking on more work
- Solved an inability to patch by maximizing the frequency at patches were available/applied
- Simplified the management of the infrastructure by introducing more components/complexity

Argh!

Tuesday, 14 June 2022

Ditching Redhat

About Me