Monday, 13 June 2016

Local File Inclusion - why everything you ever read about file uploads is probably wrong

Allowing people to load content onto your server potentially allows them to deploy code which will be executed by your server. Your server can easily be pwned, leading to your work being destroyed/stolen and the website used for spamming, phishing, DOS attacks.....its a bad thing.

Where I've implemented file uploads, or have been asked how someone else should do it, I usually give this advice:
  1. Ensure that uploaded content is held outwith the document root, failing that, in a directory configured with "php_flag engine off"
  2. Only use filenames with safe characters - [a-z], [A-Z], [0-9], .-_
    (OWASP, after detailling why you should not use blacklisting as a preventative measure, offer a black list of characters :) )
  3. Only allow content with a whitelisted extension to be uploaded
  4. Check the mimetype of uploads - using PHP's mime_content_type(), not the value supplied by the user
  5. Preferably convert the content to a different format (then back to the original format if required)
This looks like Security-in-depth, i.e. applying more protection than is actually required. I've always been somewhat skeptical of this as a justification for effort and expense (often in the context where in Security in breadth is sadly lacking). Certainly ticking each of the boxes in the list above is not always practical. But relying on ones own knowledge and understanding is a dangerous conceit.

I am indebted to wireghoul for pointing out to me that there is a rather subtle attack which can be exploited for LFI on Apache systems.

Hopefully you know that the PHP interpreter will run the PHP code embedded in any file presented to it. To give a simple example:
  1. amend your webserver config to send png files to the PHP handler, e.g. by creating a .htaccess file containing

     AddHandler application/x-httpd-php .png

  2. write your PHP code, and then append it to a valid png file....

        echo '<?php mail("root@localhost", "hello", "from the inside");' >>image.png

  3. Point your web browser at the URL for image.png, and the picture renders in your browser, just as you would expect....but the PHP code executes too. No surprises so far. After all, if someone can reconfigure your webserver, they don't need to bother with the LFI - they've already pwned the system.
But it is also possible to get the webserver to execute PHP code embedded in an image file without changing the handler!

This comes about due to way mod_mime infers mimetypes from extensions

mod_mime will interpret anything in the name beginning with a . as an extension. From the linked documentation, note the paragraph

   Care should be taken when a file with multiple extensions gets associated with both a media-type and a handler. This will usually result in the request being handled by the module associated with the handler. For example, if the .imap extension is mapped to the handler imap-file (from mod_imagemap) and the .html extension is mapped to the media-type text/html, then the file world.imap.html will be associated with both the imap-file handler and text/html media-type. When it is processed, the imap-file handler will be used, and so it will be treated as a mod_imagemap imagemap file.

The mod_mime documentation authors go on to explain that this is the default behaviour for handlers, but that it is possible to ensure that only the last extension is used for choosing the handler with a FilesMatch block around the SetHandler directive.
So if we name our uploaded file image.php.png then we should be able to get our PHP code executing on a server.

Indeed, guess what method PHP recommend for configuring your server? (as implemented by Redhat, Suse, Ubuntu and others).

Let's try it out!

In the earlier example, we appended the PHP code to the end of the file. When I tried this again with the double extension, I got a lot of unusual characters in my browser. The Content-type was PHP's default test/html not image/png. So instead, the PHP code needs to prefix the image and set the content-type header:

    ( echo -n  '<?php  header("Content-Type: image/png");
      mail("root@localhost", "hello", "from the inside");
      ?>' ; cat original_image.png ) >> image.php.png
Point your browser at the URL, and once again, PHP code executes, the image renders but this time using the default config on most webservers.

The double extension is known but not by many.

It's not just me. I had a look around the web at advice being given about PHP file uploads - and a lot of it is wrong.

All the articles I've looked at claim that it only works where the extensions *other* than '.php' are unknown to the browser. As demonstrated here. this is NOT true.

Validating the mimetype won't protect against this kind of attack; although it would detect the file I created in the example, it won't work if the PHP code is embedded somewhere other than the start of the file. If it is contained in an unreachable block or as EXIF, then the code will execute, the file will be a valid image (based on its extension) and a mime check (which reads the first few bytes of the file) will say its an image. It will only render as an image when served up by the webserver if output buffering is enabled by default - but this will not often be a problem for an attacker.

FAQ - How do I build a LAMP cluster

Once again, someone just asked this question on a forum - this time on LinkedIn, but it crops out often in other places like serverfault.

There are so many variables that you are unlikely to get the right answer without supplying a huge amount of detail. Facebook run a fault-tolerant, scalable architecture using Linux, Apache, PHP and MySQL (and lots of other things). Do you think their infrastructure and applications could be answered in a post? Do you think their architecture is appropriate for a backoffice supplies management database?

At the other end of the spectrum, even with just 2 machines there are a lot of different ways to connect them up.

The first step in the battle is knowing what questions you should be asking - at least to yourself.

How do you replicate content across multiple webservers?

HTTP is stateless - so as long as webservers have access to the same data, you have redundancy. For TLS it helps performance to have a shared store for session keys (although there are ways to store session keys on the client).
PHP code shouldn't change very frequently but for complex applications, managed deployments are a requirement (or significant investment and skills in how to avoid having to coordinate the deployment). Sometimes rsync is enough. Sometimes you need a multi-site SAN. Sometimes you need a custom content management system written from scratch.

Do you host the system yourself? 

If so, then you have a lot of flexibility over how you configure the cluster and communication between nodes - make sure you have more than one IP address though. But do you have the skills to design and manage the cluster? Moving the service into the cloud solves a lot of problems (or rather makes them someone else's problems) but creates new ones for you.

How frequently is data updated?

...and must each node be exactly in sync? If not, how much lag can you tolerate?

Do you use PHP sessions?

These are (frequent) data updates. They need to be shared across the cluster.

How much can you spend?

Employment a good consultant is going to cost you money. But it should be money well spent. If you need to have your appendix removed, would you look for the lowest bid? While it seems that it is possible to outsource some development work like this, I'm not convinced it's the right way to plan an IT architecture.

How much downtime can you afford?

Both scheduled and un-scheduled (i.e. planned and accidental). 
In my experience, once you move past a single machine hosting your site, there is surprisingly little correlation between investment and availability. But planning for how the service will degrade when components start failing is always good practice. Knowing that there is scope for at least scheduled maintenance windows does expand the horizon of possibilities.

Do you need to split the cluster across data centres for redundancy?

While it would be nice to design an architecture which can scale from a couple of machines to multiple data-centres, the likely outcome of this would be a very expensive, and hard to manage solution running on a pair of servers. Even if we were all running the same publish-only application, the right architecture changes according to the scale of the operation. 

How scalable do you want the system to be?

Continuing the previous point - maybe you should be planning further ahead than just the next rung on the ladder.

Where are your users/customers/datacentre?

Geography matters when it comes to performance.

What is the current volume of writes on the database?

There are 2 well defined solution for MySQL replication - synchronous and asynchronous. The latter is simple, available out of the box but only works for 2 nodes accepting writes from the outside world. Due the replication being implemented in a single execution thread there are potential issues with capacity where there are a large proportion of writes.

What network infrastructure is between the server(s) and clients?

It may aready be capable of supporting some sort of load balancing. But don't discount Round Robin DNS - there is a HUGE amount of FUD about RRDNS on the internet. It's mostly completely wrong. But there benefits to using other approaches - while RRDNS solves a problem for clients connecting via HTTP, its not a good way to manage redundancy between your PHP and MySQL.

What is your content caching policy?

Caching is essential for web usability. But the default ETags setup for Apache is far from optimal for a loose cluster of webservers. There are different approaches to managing caching effectively and (if immutable) these may impact your architecture.

What regulatory frameworks apply?

Even where there are no explicitly obligations (such as PCI-DSS) as soon as you start accepting and process data from users you have a duty of care to them. The type of data you are collecting and what you do with it has an important bearing on the security and therefore the architecture of the service.

The wrong questions

How do I build a LAMP cluster

Obviously - the point of this post is to explain why that is the wrong question to ask.

The number of users

This has very little to do with the infrastructure required. The number of concurrent sessions is only marginally better. When you start talking about the number of HTTP requests, the split between static and dynamic content, and the rate of data updates then your getting closer to the information required to inform a choice of architecture.