Wednesday, 27 February 2013

Compiling PHP

Since I've been playing around with computers for more years than I care to remember, I used to be very familiar with the process of unpacking tarballs and compiling code from source. But to be honest it's not something I do very often these days: Most Linux distributions come with an extensive array of pre-compiled binaries: the package management software keeps me up to date with security patches: my C skills are a bit rusty: And life's just too short!

But recently I've been looking at LAMP performance in some detail. I was surprised to find the PHP on my workhorse desktop (PCLinuxOS 2012) had been compiled with no optimization and the resulting binary stripped. I should note that at the time I installed it, there was neither a 64-bit nor an AMD specific port of the installation, hence the OS build was more about compatibility than performance.

So I had a play around to see if there was any benefits at compile time.

PHP is a scripting language and a lot of it's functionality is implemented in extension libraries. Dynamically linking these at runtime does have a performance overhead (although with fastCGI and mod_php, since the code forks rather than loads, this shouldn't be too great). For most people the ability to choose which extensions are loaded at runtime (and thus trim the memory footprint) outweighs the small processing overhead of runtime linking. Unfortunately my test methodology didn't allow me to measure the actual impact of static vs dynamic. In the absence of a site to test and complex tools for HTTP load testing - ab would not cut the mustard - I was using the CLI SAPI where there would be a big performance drop which would not happen on a properly configured webserver.

To compare the different optimization levels I compiled PHP 5.3.22 using gcc 4.5.2 with O0, O2 and O3 then timed 'make test'. My test machine was a dual core AMD athlon

property O0 O2 O3
CFLAGS CFLAGS="-march=native -pipe"
CXXFLAGS="${CFLAGS}"
CFLAGS="-march=native -O2 -pipe"
CXXFLAGS="${CFLAGS}"
CFLAGS="-march=native -O3 -pipe"
CXXFLAGS="${CFLAGS}"
Average (sys + usr) seconds 214.0 206.7 207.7
Std Dev (sys + usr) seconds 6.2 0.5 1.0
Max RSS (Kb) 569.8 569.8 570.0
Exe size Kb 6707.9 6880.8 7403.3
size of tripped exe 6257.5 6437.5 6973.6
I've not shown the results here, but I saw no measurable difference between the usr + sys times nor the max rss comparing the stripped and un-stripped binaries.
Interestingly the O3 optimization is very slightly, but measurably slower than O2. And O2 is around 5% faster than O0
The gain of 5% seems a little disappointing compared to metrics reported for other programs but I would expect to see to see greater gains for PHP code implementing (for example) encryption and image processing.