Argh!: FAQ - How do I build a LAMP cluster

Once again, someone just asked this question on a forum - this time on LinkedIn, but it crops out often in other places like serverfault.

There are so many variables that you are unlikely to get the right answer without supplying a huge amount of detail. Facebook run a fault-tolerant, scalable architecture using Linux, Apache, PHP and MySQL (and lots of other things). Do you think their infrastructure and applications could be answered in a post? Do you think their architecture is appropriate for a backoffice supplies management database?

At the other end of the spectrum, even with just 2 machines there are a lot of different ways to connect them up.

The first step in the battle is knowing what questions you should be asking - at least to yourself.

How do you replicate content across multiple webservers?

HTTP is stateless - so as long as webservers have access to the same data, you have redundancy. For TLS it helps performance to have a shared store for session keys (although there are ways to store session keys on the client).
PHP code shouldn't change very frequently but for complex applications, managed deployments are a requirement (or significant investment and skills in how to avoid having to coordinate the deployment). Sometimes rsync is enough. Sometimes you need a multi-site SAN. Sometimes you need a custom content management system written from scratch.

Do you host the system yourself?

If so, then you have a lot of flexibility over how you configure the cluster and communication between nodes - make sure you have more than one IP address though. But do you have the skills to design and manage the cluster? Moving the service into the cloud solves a lot of problems (or rather makes them someone else's problems) but creates new ones for you.

How frequently is data updated?

...and must each node be exactly in sync? If not, how much lag can you tolerate?

Do you use PHP sessions?

These are (frequent) data updates. They need to be shared across the cluster.

How much can you spend?

Employment a good consultant is going to cost you money. But it should be money well spent. If you need to have your appendix removed, would you look for the lowest bid? While it seems that it is possible to outsource some development work like this, I'm not convinced it's the right way to plan an IT architecture.

How much downtime can you afford?

Both scheduled and un-scheduled (i.e. planned and accidental).

In my experience, once you move past a single machine hosting your site, there is surprisingly little correlation between investment and availability. But planning for how the service will degrade when components start failing is always good practice. Knowing that there is scope for at least scheduled maintenance windows does expand the horizon of possibilities.

Do you need to split the cluster across data centres for redundancy?

While it would be nice to design an architecture which can scale from a couple of machines to multiple data-centres, the likely outcome of this would be a very expensive, and hard to manage solution running on a pair of servers. Even if we were all running the same publish-only application, the right architecture changes according to the scale of the operation.

How scalable do you want the system to be?

Continuing the previous point - maybe you should be planning further ahead than just the next rung on the ladder.

Where are your users/customers/datacentre?

Geography matters when it comes to performance.

What is the current volume of writes on the database?

There are 2 well defined solution for MySQL replication - synchronous and asynchronous. The latter is simple, available out of the box but only works for 2 nodes accepting writes from the outside world. Due the replication being implemented in a single execution thread there are potential issues with capacity where there are a large proportion of writes.

What network infrastructure is between the server(s) and clients?

It may aready be capable of supporting some sort of load balancing. But don't discount Round Robin DNS - there is a HUGE amount of FUD about RRDNS on the internet. It's mostly completely wrong. But there benefits to using other approaches - while RRDNS solves a problem for clients connecting via HTTP, its not a good way to manage redundancy between your PHP and MySQL.

What is your content caching policy?

Caching is essential for web usability. But the default ETags setup for Apache is far from optimal for a loose cluster of webservers. There are different approaches to managing caching effectively and (if immutable) these may impact your architecture.

What regulatory frameworks apply?

Even where there are no explicitly obligations (such as PCI-DSS) as soon as you start accepting and process data from users you have a duty of care to them. The type of data you are collecting and what you do with it has an important bearing on the security and therefore the architecture of the service.

The wrong questions

How do I build a LAMP cluster

Obviously - the point of this post is to explain why that is the wrong question to ask.

The number of users

This has very little to do with the infrastructure required. The number of concurrent sessions is only marginally better. When you start talking about the number of HTTP requests, the split between static and dynamic content, and the rate of data updates then your getting closer to the information required to inform a choice of architecture.

While I can't tell you what the infrastructure you need will look like, here are a few general guidelines:

Failover should only be used as a last resort

Failover fails. And it fails at the worst possible time.

Adding components increases failures

Complexity=unreliability. Adding components to support scalability / redundancy should be be thought through very carefully.

Scaling horizontally is much cheaper than scaling vertically

Scaling horizontally = adding more servers

Scaling veritically = replacing a small cheap server with a big expensive one

Write-heavy, Online Transaction Processing database hardware is often an exception to this rule.

Argh!

Monday, 13 June 2016

FAQ - How do I build a LAMP cluster