Tuesday 1 October 2013

Daily Mail Fail


What looked like an interesting link appeared in my inbox the other day, so I followed it to read the article. The link in question was to a page on the www . thisismoney . co . uk site - owned and operated by the Daily Mail and proud to describe itself as "Financial Website of the year".

I did not expect the Daily Mail to let the facts get in the way of a good story – and this did little to improve my impression of them, however I was surprised at how poor the performance was....and then discovered how poor they really were at IT services.

I noticed that the content continued to load for some time after landing on the page.

Broadbandspeedchecker.co.uk clocks my download speed at 44.95 Mb/s, not bad, although the latency from Maidenhead seems high at 168ms RTT. But the page from the Daily Mail took 47.42 seconds to get to the onload event then continued downloading stuff for a further 42 seconds: 1 minute and 19 seconds to download a single page?

There was only 1.4Mb of data in total, but split across no less than 318 requests across 68 domains, including 12 404s from *.dailymail.co.uk, erk!

But digging further I found that the site did not just perform badly – it's probably illegal.

In addition to (what appears to be) the usual 4 Google Analytics cookies, my browser also acquired session cookies from .thisismoney.co.uk, .rubiconproject.com, b3-uk.mookie1.com (x2), .crwdcntrl.net (x2) and.......129 cookies with future expiry dates.

FFS!

(a full list appears below)

For the benefit of any readers outside the European Union, member countries must all implement a set of LAWS (not rules, or guidelines) regarding the use of any data stored on a computer, including cookies. In the UK, these are described by the Privacy and Electronic Communications (EC Directive) (Amendment) Regulations 2011, which websites were required to implement in 2012.

Did the Daily Mail inform me that it was going to store these cookies?

No

Did the Daily Mail ask for my consent to store these cookies?

No

Did the Daily Mail provide any information about cookies on the page?

No

Did the Daiy mail provide a link to their privacy policy on the page?

Yes, in teeny-weeny text – the very last visible element on the page.

Did the Daily Mail offer me a chance to opt-out of accepting the cookies?

No

Is this a world record?

Maybe?



In the absence of any means to tell the Daily Mail I don't want their cookies via their website, I thought I would use the method built into my browser (although the cookie law does require that I should not have to jump through these hoops for compliance). So I enabled the do-not-track feature in Firefox deleted the cookies and cache, hit the reload button, waited a further 44 seconds (my ISP has transparent caching).....


Can you guess what happenned next?


All the cookies came back again.

The challenge

Do you know of a worse site than this for dumping cookies? Add a comment and a link to your analysis and I'll publish it.

Monday 16 September 2013

Zend Optimizer Plus - still not following the party line

Having previously failed to get a significant difference in benchmarks between PHP 5.3 and 5.5, I was successful in establishing that the optimizer was producing code which ran slightly faster (about 2% with DokuWiki).

This time I tried ramping up the concurrency to see if PHP could deliver on its performance promises.

Running

ab -n 12000 -c $X http://localhost/src/doku.php

For $X in [10,50,100,200,300,400,500,600,700,800]  I got....

For less than 500 concurrent connections, PHP 5.3.3 is slower - but only about 5%. At more than 500 concurrent requests, 5.5.1 is slower!

I'm just no good at this benchmarking business.

Both were running from nginx+php-fpm with the same (common) config. 5.5.1 had full ZOP+ optimization enabled.

Wednesday 4 September 2013

Zend Optimizer Plus - trying to do it right

Reading further (didn't bookmark the link and can't find it now) alongside benchmarks showing ZOP+ to be around 20% faster for "real world" applications there's also mention of a big reduction in memory usage.

I'm quite prepared to believe that better use of memory is possible - the runtime footprint of PHP code seems to be around 8 times the footprint of the script on disk - so plenty of scope for improvement. I was running my tests with 100 concurrent connections - not nearly saturating my machine. I expect that running with a much higher load / less memory would translate into the the performance improvements reported elsewhere - more testing required?

Meanwhile I had another look at the optimizer. I repeated the setup from last time, running with PHP 5.5.1, ZOP+ with filemtime checking off, fetching a single Dokuwiki page. The control test with full optimization - as per previous run - is giving slightly different results than last time. Since I'm running this on my home machine and it's also running lots of other things like X, KDE, my browser, mail client... it's possible that the system isn't in exactly the state it was in when I ran the previous tests.

Full Optimization: 

opcache.optimization_level=0xffffffff

6.334 ms/req

Optimization disabled : 

opcache.optimization_level=0

6.452 ms/req

Repeating the test several times gave consistent results; about 2% improvement in speed.

Not revolutionary, but every little helps - and in fairness Dokuwiki already has good performance optimization in the PHP code.

When time allows I'll go back and look at memory / CPU usae while running ZOP+ vs APC

Tuesday 27 August 2013

Doing it wrong again - this time with Zend Optimizer plus

The guys at PHP have now committed to shipping Zend Optimizer Plus with future releases of PHP, I thought I'd have a play around with it.

tl;dr

While normally I run my PHP from Apache + mod_php, for the purposes of this exercise, it was easier for me to set up nginx / php-fpm using PHP 5.3.3/APC 3.1.2 and 5.5.1/Zend Opcache 7.0.2dev. All were compiled from source using default settings on a 32-bit PCLinuxOS 2012 distribution (kernel 2.6.38.8) on a dual AMD 4200 machine with 2Gb of memory.

Tests were run using ab on localhost, each test was run by first seeding the opcocde cache (ab -n 2 -c 2) then taking a measurement (ab -n 1000 -c 100).

For both APC and ZOP+ tests were run with an without checking for modified timestamps on PHP source files.

Final scores

My results showed Zend Optimizer Plus to be no faster than APC. Although the numbers are within the margin of error, if anything ZOP+ was slower.

Config Application Timestamps time per request (ms)
Php 5.3.3 + APC Dokuwiki yes 6.274
Php 5.3.3 + APC Dokuwiki no 6.271
Php 5.5.1, ZOP Dokuwiki yes 6.643
Php 5.5.1, ZOP Dokuwiki no 6.293
Php 5.3.3 + APC + mysql Wordpress yes 36.012
Php 5.3.3 + APC + mysql Wordpress no 35.868
Php 5.5.1, ZOP, mysqlnd Wordpress yes 35.905
Php 5.5.1, ZOP, mysqlnd Wordpress no 35.978
static HTML from Dokuwiki Dokuwiki n/a 0.171
static HTML from Wordpress Wordpress n/a 0.182

This seems to fly in the face of what is currently being reported elsewhere.

More stuff about my methodology

I wanted to create a reasonably realistic test, hence using 2 off the shelf Content Management Systems. Wordpress uses a MySQL backend for its data: and there is a further difference beteween the APC and ZOP+ configurations: the former uses libmysqlclient while the latter was built with MySQLnd (which meant I had to rewrite the database class to use the mysqli_ functions in place of mysql). The effects on performance are complex and tied to the level of concurrency but at 100 concurrent HTTP requests I was expecting this to be minimal. Dokuwiki, on the other hand, uses file based storage.

Other reviews

The PHP wiki page about the change links to a spreadsheet showing what look like impressive stats. The stats are reported as requests per second for various configurations.


Some more reviews:

https://managewp.com/boost-wordpress-performance-zend-optimizer - doesn't show response times / requests/second but does say that load and memory usage were lower implying greater capacity using ZOP+ compared with APC

http://halfelf.org/2013/trading-apc-for-zend/ reports a similar reduction in CPU and memory, but again no response times.

http://www.ricardclau.com/2013/03/apc-vs-zend-optimizer-benchmarks-with-symfony2/ results again in req/s, and showing an improvement of around 10-15%

http://massivescale.blogspot.co.uk/2013/06/php-55-zend-optimiser-opcache-vs-xcache.html compared ZOP+ and Xcache finding approx 15-20% improvement in req/s and similar reduction in response times with Joomla.

The optimizer bit(s)

PHP opcode caches have been around for a long time. ZOP+ brings something new: a code optimizer. Since it is still generating pcode it doesn't apply the CPU specific tweaks that a native code compiler does. Despite the 32 bits integer used to set the optimizer flag, only 6 flags are recognized by the optimizer (and the last pass only cleans up the debris left by the first 5). The optimizations are mostly substitutions - replacing PHP's builtin constants with literals, post increment with pre-increment, compile time type-juggling of built-in constants and such like. There is no inlining of functions in loops. No branch order prediction. Having dug through the code, I was not expecting it the optimizer to deliver revolutionary speed improvements.

And yet, Dimitry's spreadsheet shows 'ZF Test (ZF 1.5)' going from 158 req/s to 217 req/s!

I presume this refers to the Zend Framework. While this far from speedy I find it astonishing that the performance of the code should improve so much with a few relatively simple tweaks to the opcodes - it rather suggests that there is huge scope for optimizing the code by hand. Although I also note that the performance of 'Scrom (ZF App)' only improves by around 8%.

What am I doing wrong?

The consistent difference (apart from the opcode cache) in my experiment was using different versions of PHP - to a certain extent I'm not really comparing like-for-like. I can only assume that if I ran APC against PHP 5.5.1 and/or ZOP+ with PHP 5.3.3 I would see a very different story. However if you are seeking optimal performance at low load levels (rather than optimal capacity) then there seems to be little incentive to apply this upgrade.

There are anecdotal reports of stability issues with APC on PHP 5.4+; there may be sound technical and economic reasons why APC is not being actively maintained and for ZOP+ to be a better strategic choice.

A clear choice?

I can live without APC's support for user-data caching. But the elephant in the room is the fact that ZOP+ does not reclaim memory: if your code base is larger than the cache size, or the cache fills up with old verions of code, it forces a full flush and re-initialization. This should not be a problem for sites with a dedicated devops personnel managing releases to a small number of applications using a continuous deployment strategy. However for the rest of us there needs be a significant performance advantage with ZOP+ to make this a price worth paying.

Saturday 17 August 2013

Starting a new website - part 3

So the decision was made, I would stick with Dokuwiki and use PJAX for loading the pages.

A bit of coding and hey presto...www.scottishratclub.co.uk

(live site is sill running an incomplete version of the code - note to self - get the current version deployed)

In order to structure the Javacript changes nicely, keep everything tidy and fit in with Dokuwiki too, the functionality is split across a syntax plugin (for implementing widgets, including initializing PJAX, fixing the problems introduced by defering loading of the javascript and accomodating a strict Content Security Policy). This then places some constraints on how further widgets are implemented so it's really a framework (yeuch!). Anyway the plugin is called Jokuwiki

In order to use PJAX, the source page needs to be slightly modified (but it's JUST 5 LINES OF CODE!):


if ('true'!=$_SERVER['HTTP_X_PJAX']){
....top part of page
} ?><div id="pjax-container">
....stuff loaded via pjax
</div><?php
if ('true'!=$_SERVER['HTTP_X_PJAX']){
....bottom part of page
}

But just to make it real easy, I published a template too. Not the one I used on my website - but if anyone want it....let me know.

The impact of PJAX on performance is rather large:


Of course it had to be deployed to the site. So I dusted down pushsite and fired it up with a recipe for deploying the site. About 50 files loaded then I stopped getting responses from the remote system. I ran it again.....a further 20. The socket was still connected but nothing happening. Switching to passive mode didn't help. Adding throttling didn't help. I spent several hours battling with it and gave up. Same story the following day - so I logged a call with the service provider. The following day, they suggested using a different FTP server.....same problem. They said they'd get back to me.

Since I had no ssh access, I couldn't unpack a tarball from the shell - doing it via a PHP script invoked from the web would have meant I'd have to spend just as much time fixing the permissions as uploading the stuff by hand. But a bit of rummaging around in cpanel and I found that there was a back/restore option running over HTTP - so I download a backup, unpacked it, overwrote the backed-up website with my new site, packed it up and restored it onto the server. Job done.

Thursday 30 May 2013

Keyboard STILL not detected. Press F1 to continue

I'm very particular about keyboards and mice. I find a huge difference between devices with different feels - for example,  I've usually found cherry keyboards to be a bit clicky. I've recently been supplied with a new work laptop which has a 'chiclet' keyboard - which I hate. At one point I acquired a second hand Tandem PC which came with an original IBM AT keyboard. The latter was a phenomenal piece of cold-war engineering, clearly designed to survive at ground zero of a thermo nuclear strike. The keys were like those on a manual typewriter. CLICK - clack - CLICK. You would have expected a bell to ring every time you pressed the return button. I quickly found something else and sold the IBM keyboard at a car boot sale to a lumberjack whom needed a new axe.

Fast forward a number of years....I blogged before about getting a cheapo keyboard from Currys. This had been providing sterling service for the past 2 years, until one day, completely out of the blue, strange things started happening after I logged in. Initially the mouse starting selecting stuff at random, clicking on links in my browser caused new windows to open. So I tried applying the holy mantra of IT - switch-it-off-and-back-on-again. Incorrect password. And again....hang on a minute. Time for applying some advanced systems admin skills. Summoning my computer mojo and focussing 30+ years of hardware and software skills into a finely honed, cutting edge of diagnostic meditation....no - the Caps lock key is definitely off.

Numlock...on....off. Caps lock....on....off. Ctrl-Alt-F2, switch to a text console - good, that works. Now try logging in. ROOT. Who? Shift seems to be inverted. This is not right. I pick up the keyboard. Water pours out. Water? Smoke, yes, I'd expect that from computers, but not water. Did I mention I have kids?  I don't know if it was my clumsiness in taking it apart, permanent damage from the water, or I just couldn't get it dry enough but the keyboard was a goner.

Off to Curry's. However the only wired keyboards they have are rather nasty. The ones with good quality mechanical bits are all wireless (every wireless keyboard I've used has been very slow) or ridiculously pricey 'gaming' keyboards with heavy clicks. My search is further complicated by the fat I'm looking for a three-quarters sized keyboard to make better use of my desktop real-estate. Back home and on to Amazon, where I find this which looks like it fits the bill in terms of size, key type and action. Having been so fussy about not getting keyboards because I don't like the feel of them, it's a bit of a gamble to buy one online - but sometimes we all just go a bit crazy!

It arrived today. I was excited. The keys have a nice soft click to them. But it proved to be a bit difficult to get working. This anti-ghosting thing means that it just won't talk to the USB HCI stuff in Linux. I could access and type in the Bios, but after after that nothing.

I plugged back in a spare PS2 keyboard and found this in my logs:


May 30 20:00:57 localhost klogd: usb 2-1: New USB device found, idVendor=060b, idProduct=2231
May 30 20:00:57 localhost klogd: usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
May 30 20:00:57 localhost klogd: usb 2-1: Product: USB Keyboard
May 30 20:00:57 localhost klogd: usb 2-1: Manufacturer: KB
May 30 20:00:57 localhost klogd: input: KB USB Keyboard as /devices/pci0000:00/0000:00:0b.0/usb2/2-1/2-1:1.0/input/input0
May 30 20:00:57 localhost klogd: generic-usb 0003:060B:2231.0001: input,hidraw0: USB HID v1.11 Keyboard [KB USB Keyboard] on usb-0000:00:0b.0-1/input0
May 30 20:00:57 localhost klogd: generic-usb: probe of 0003:060B:2231.0002 failed with error -22

lsusb said this:

Bus 002 Device 002: ID 060b:2231 Solid Year
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               2.00
  bDeviceClass            0 (Defined at Interface level)
  bDeviceSubClass         0
  bDeviceProtocol         0
  bMaxPacketSize0         8
  idVendor           0x060b Solid Year
  idProduct          0x2231
  bcdDevice            2.21
  iManufacturer           1 KB
  iProduct                2 USB Keyboard
  iSerial                 0
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength           59
    bNumInterfaces          2
    bConfigurationValue     1
    iConfiguration          0
    bmAttributes         0xa0
      (Bus Powered)
      Remote Wakeup
    MaxPower              100mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           1
      bInterfaceClass         3 Human Interface Device
      bInterfaceSubClass      1 Boot Interface Subclass
      bInterfaceProtocol      1 Keyboard
      iInterface              0
      ** UNRECOGNIZED:  09 21 11 01 00 01 22 4b 00
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0008  1x 8 bytes
        bInterval              10
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        1
      bAlternateSetting       0
      bNumEndpoints           1
      bInterfaceClass         3 Human Interface Device
      bInterfaceSubClass      0 No Subclass
      bInterfaceProtocol      0 None
      iInterface              0
      ** UNRECOGNIZED:  09 21 11 01 00 01 22 6c 00
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x82  EP 2 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0008  1x 8 bytes
        bInterval              10
Device Status:     0x0000
  (Bus Powered)

A bit of Googling and I find that the problem is caused by the Anti-Ghosting support in the keyboard (allows you to press lots of keys simultaeneously). Pressing Ctrl + Alt + Shift (all bottom left) + N disables this and the keyboard is usable with Linux (no mention of this in the manual which came with it of course).

Rather than have to remember the weird key combination I've ordered a PS2 adapter which I've read should solve the problem.

(and along the way I find *MORE* bugs in Blogger's WYSIWYG editor. Sigh)




Wednesday 29 May 2013

Starting a new website - part 2

The other tool I looked at was OpenOutreach

This is a bundle of Drupal and various widgets. Although the result is something which is well integrated, and with a huge amount of functionality, once again this comes at a horrendous performance cost. Even browsing through a couple of pages, cached serverside, with APC enabled (and sized to hold all the code) via http://localhost/, it was taking a visible 3 seconds to load the page. Spookily the openoutreach site (120 ms RTT) actually loaded faster on my machine than the local version - suggesting that maybe there was scope for a lot more tuning - particularly running a caching proxy in front (I was using a fairly stock Apache 2.2 pre-fork). And once again, cron jobs were a requirement.

It's not that I have anything against cron - and plenty of hosting packages come with cron support these days. It just begs the question of why did the people who wrote what is intended to be an off-the-shelf package not make so little attempt to minimise the dependencies? Why did they think that cron that was the easiest way to run a job? Do I have time to monitor the cron jobs and make sure they are working properly? But more importantly if they've chosen a quick fix solution here, what other corners have been cut?

The more I read about performance an usability the more I keep coming back to the magic 1 second. While there continues to be measurable productivity improvements down to around 0.3 seconds, and Google, Amazon and others have published studies showing that there is a measurable impact at lower time intervals, 1 second seems to standout as a massive milestone in page loading times - it is possible to get significantly faster page loads than this - but this seems to be a breakpoint of diminishing returns.

Sorry Drupal / Openoutreach, maybe not a fail, but B minus.

At this point I could have started looking round the many, many other open-source PHP CMS available. But I decided that I'd already spent too much time on this and went back to good old Dokuwiki. Dokuwiki does have issues - being file based it's difficult to scale to very high volumes of content / users and tricky to scale beyond a single server (the simplest way to this is to run additional nodes are read-only and replicate content updates from a single master node - since it's file based, this is easy to setup). However for my purposes scalability is not going to be an issue - the core system is very fast. The more recent versions have slowed down a bit at the front end due to the addition of jquery - this and all the javascript is loaded at the top of each page - leading to a rather ugly 200-350 millisecond pause in the screen rendering even loading from cache. However the template/theme for the application is contained in a plugin (there's lot's of themes available) which is a relatively simple bit of PHP code.

Unfortunately Dokuwiki bundles all the content to be written into the section together in tpl_metaheaders() - so although I was able to get the scripts at the bottom, it needed (small) changes to the core Dokuwiki source code.

With the core system (version Weatherwax) making these changes and moving the scripts to the bottom seemed to have no adverse effect on the functionality of the system. Unfortunately it does break a lot of existing plugins. This also exposed another issue with this approach. A common feature with many CMS, particularly those using javascript capable plugins is that they tend to embed javascript within the HTML of the page. In addition to the problem that this usually expects a library to already have been loaded, it also makes it impossible to implement a Content Security Policy banning inline scripts (a very simple and effective way to protect against XSS attacks).

Programming, like crack cocaine, is a rather hard habit to break - so I immediately started thinking about ways to solve this. The inline code usually just does 2 things:

  • declares the data to be used by the rendering script
  • invokes the rendering code
The data of course doesn't need to be inside a script tag - it just needs to be somewhere the script can read it. As for rendering the code - a simple DOM parser could be invoked after the javascripts were loaded and parsed to invoke the renderers. This could cause re-flows and re-paints - but by using block elements it need not be disruptive (certainly a lot less disruptive than stopping the initial rendering of the page to process the scripts.

So, first problem, where to put the data. 

It's trivial to add custom properties to the DOM (although there are some reports of issues in MSIE prior to version 8) I've seen successful implementations running in version 6 and 7. To be strictly correct you should declare in these in a custom DTD (as Facebook do with FBML) and not doing so can result in the browser rendering in quirks mode. But a bit more reading revealed another great feature in HTML5: you can add custom attributes to nodes without a custom DTD as long as the name is prefixed with 'data-' and the remainder is all in lower case. This is backwardly compatible with older browsers and avoids having to write a custom DTD. Some browsers will still be forced into quirks mode - but since I've spent rather a lot of time structuring the CSS and HTML to be responsive this should not be a big problem. Yeah!

I was about to start writing the code to implement the scanner, when I stopped and thought about this. I was creating a javascript framework. In order for it to have any value, I'd have to re-write the plugins I wanted to use and ideally get other plugin writers to join my bandwagon. And there was also a risk that the core developers of Dokuwiki might introduce new features that broke when the scripts were at the bottom.

A quick email to the Dokuwiki developers list showed that the issue of scripts at the bottom had already been raised, considered and binned. If this was the road to go down, then I'd have to spend some time writing nice well behaved and integrated code - and that would also mean taking time to learn a lot more about Dokuwiki internals.

A further issue with this approach is that I'd still have to refactor any plugins I wanted to use to fit into this new framework - once again I was losing sight of the main objective here - plug and play!

Was there another way to solve the problem? Yes! although it's not backwardly compatible, HTML5's pushstate allows you to load content using ajax, and set the location in the browser without transitioning the entire page. There's a really nice implementation of this in the pages on Github - which degrades nicely on older browsers. And the source code / documentation for PJAX is available (from GitHub of course). 

Now, all the custom modifications could go into the template rather than the core code (and it's only about 4 lines of code). The only caveat is that I don't have a browser which supports it on any computers at home so I'll need to do some upgrading before I can try it out for real.

I will try to go back to looking at scripts-at-the-bottom in Dokuwiki, but for now I've got a working solution that performs well and provides a base for some nice eye-candy effects.

Monday 6 May 2013

Starting a new website

I've been volunteered as the webmaster for a club my daughter is involved with. Currently they have  a rather plain static html website and a free Invision Power Board forum. Increasingly the members are using Facebook as a means of communicating. This is far from ideal - the discussions and postings add a lot of value to the club and they are being given away gratis on Facebook. Further given the imminent adoption of the Instagram act, I felt it's important to maintain additional controls over users copyright.

I'm  big fan of free software and data - but in this case, the members are expecting a bundle of additional products, services and discounts. While it's a non-profit organization with a very small turnover there are significant costs involved. Hence I felt it was a good idea to provide a more realistic alternative to Facebook on the Club site, including a members only area.

The first problem was the hosting. The site was using the starter package from HeartInternet. Although it was PHP enabled, the amount of storage was very small - allowing users to upload their own content would have filled the quota very quickly, and there was no  database support. Upgrading the account to support the expected usage would have been exorbitantly expensive. So the first task was to find a hosting company.

While I can find lots of very cheap hosting companies claiming to operate in the UK - nearly all of them have their datacentres in the US/Canada. It's not that I've got anything against the former colonies - but I already have a site hosted in Florida, and although packets go back and forth at around half the speed of light (I measured it - how sad) the latency is a killer. I don't know if there's a paucity of UK service providers or if they're just not as good at SEO as the ones across the Atlantic, but this proved rather hard. There's also a large number of dummy review sites out there. But it did confirm that the prices on offer from HeartInternet were rather expensive. www.hostpapa.co.uk claim to offer UK web hosting with unlimited bandwidth, unlimited storage along with the usual PHP/MySQL  combo for under £2 a month! It's only when you dig a little deeper you find out they have no UK Data Centres. I emailed them to ask if they could provide hosting in the UK - their reply: "for securit [sic] purpose we do not disclose the locations of our data centers".

Sorry hostpapa: FAIL

So, for the record, I did manage to track down jomongee, x9internet, 123reg (whom I already use for DNS) and Ronald MacDonald (I'm guessing no relation to the burger chain). Ronald is offering virtual servers at very low prices - although I'm a bit of a control freak, and am confident I could setup the server at least as well as most ISPs, I just don't have the time to do this - I'll just stick with a basic shared host. Notably none of these companies plaster their front pages with bogus awards and certificates. I'll probably go with x9internet - whom offer a reasonable hosting package, but what really swung it for me was the refreshingly honest way they appear to do business.

So that's the hosting side sorted out. I just need some software to run it all. Ideally the system would provide:

  • a CMS for publishing web pages - basic navigation, user authentication, public and member only pages
  • a forum where users could vote up/down posts and acquire points (like stack overflow)
  • WYSIWYG editor for forum
  • image uploads on forum posts
  • billing/payment for membership via Paypal or similar
Looking around the interent, I chose Pligg, Anahita, OpenOutReach and Oxwall to look at in more detail.

I decided to start with Oxwall because it was (IMHO) by far the prettiest. However....

1. Finding the requirements for the software (even what platform it runs on) from the website was very difficult. There's next to no technical documentation at all.

2. The requirements which are provided (in the zip file - no tarball) were a cause for concern
   - does not work with Suhosin
   - requires a cron job to be run *every* minute

3. The requirements were incomplete - the package relies heavily on mod_rewrite

4. The installation instructions don't work - assuming that the installer is suppoed to amend the URL in "Go to http://www.mysite.com/install to run the install script"  appropriately, I tried
http://localhost/mypath/install to get a 404 page

5. After running the installation script, the documentation goes on to tell the installer to run 'chmod 777' on all the directories created by the installation.a !!!!

6. Allowing the Oxwall .htaccess files to do anything they want:

    Options -All -Multiviews
    AllowOverride All
    Order deny,allow
    Deny from all

...Some progress - I got access forbidden (403) instead of 404 from the install URL.

7. The .htaccess file in the root directory contains:

Options +FollowSymLinks
RewriteEngine On

AddEncoding gzip .gz
AddEncoding gzip .gzip
  ForceType text/javascript
/FilesMatch>
  ForceType text/css

RewriteCond %{REQUEST_URI} !^/index\.php
RewriteCond %{REQUEST_URI} !/ow_updates/index\.php
RewriteCond %{REQUEST_URI} !/ow_updates/
RewriteCond %{REQUEST_URI} !/ow_cron/run\.php
RewriteCond %{REQUEST_URI} (/|\.php|\.html|\.htm|\.xml|\.feed|robots\.txt|\.raw|/[^.]*)$  [NC]
RewriteRule (.*) index.php


....i.e. it's not going to work unless it's installed in the root directory
There's also a lot of other things which are, at best strange here.

Creating a new vhost and moving the files into the DocumentRoot still resulted in a 403 error

8. After moving my current /var/www/html elsewhere and moving the Oxwall files back into the default webroot, I still got a 403 error at http://localhost/install

9. Pointing my browser at http://localhost/index.php, I finally got some output from Oxwall! It told me "Your hosting account doesn't meet the following requirements: zip PHP extension not installed"
- yes, another undocumented dependency.

10. installed php-zip, and got a configuration form (although the absence of any styling was a hint that the URL rewriting still wasn't working properly)

I know getting web paths sorted out is not easy - but I hate front controllers. But trying to create an arbitrary namespace using mod_rewrite is just asking for trouble. (BTW the Oxwall wiki runs on Dokuwiki - which I've written about before and is very cool).

While I could probably fix the problems and get a working site together (probably even fix the cron silliness) it's just not worth the effort - that the software packaging has been so sloppy, means that there are probably lots more gremlins in the code - I do not want the site pwned by the first script kiddy to come along.

It's a shame that someone has worked so hard to produce something which looks so nice and appears to have a lot of functionality in it, but makes so many basic errors.

Sorry Oxwall: FAIL 

Wednesday 17 April 2013

Whither MSIE

Looking at my stats here on blogger.com, MSIE traffic has dropped to just 5% - behind Chrome, Firefox and Safari. Although it's fallen a long way from the market share MSIE had even 5 years ago, this is still a lot more skewed than I see on the real sites I manage.

I guess I need to start showing those MS users a bit more love (they do need it).

Wednesday 27 February 2013

Compiling PHP

Since I've been playing around with computers for more years than I care to remember, I used to be very familiar with the process of unpacking tarballs and compiling code from source. But to be honest it's not something I do very often these days: Most Linux distributions come with an extensive array of pre-compiled binaries: the package management software keeps me up to date with security patches: my C skills are a bit rusty: And life's just too short!

But recently I've been looking at LAMP performance in some detail. I was surprised to find the PHP on my workhorse desktop (PCLinuxOS 2012) had been compiled with no optimization and the resulting binary stripped. I should note that at the time I installed it, there was neither a 64-bit nor an AMD specific port of the installation, hence the OS build was more about compatibility than performance.

So I had a play around to see if there was any benefits at compile time.

PHP is a scripting language and a lot of it's functionality is implemented in extension libraries. Dynamically linking these at runtime does have a performance overhead (although with fastCGI and mod_php, since the code forks rather than loads, this shouldn't be too great). For most people the ability to choose which extensions are loaded at runtime (and thus trim the memory footprint) outweighs the small processing overhead of runtime linking. Unfortunately my test methodology didn't allow me to measure the actual impact of static vs dynamic. In the absence of a site to test and complex tools for HTTP load testing - ab would not cut the mustard - I was using the CLI SAPI where there would be a big performance drop which would not happen on a properly configured webserver.

To compare the different optimization levels I compiled PHP 5.3.22 using gcc 4.5.2 with O0, O2 and O3 then timed 'make test'. My test machine was a dual core AMD athlon

property O0 O2 O3
CFLAGS CFLAGS="-march=native -pipe"
CXXFLAGS="${CFLAGS}"
CFLAGS="-march=native -O2 -pipe"
CXXFLAGS="${CFLAGS}"
CFLAGS="-march=native -O3 -pipe"
CXXFLAGS="${CFLAGS}"
Average (sys + usr) seconds 214.0 206.7 207.7
Std Dev (sys + usr) seconds 6.2 0.5 1.0
Max RSS (Kb) 569.8 569.8 570.0
Exe size Kb 6707.9 6880.8 7403.3
size of tripped exe 6257.5 6437.5 6973.6
I've not shown the results here, but I saw no measurable difference between the usr + sys times nor the max rss comparing the stripped and un-stripped binaries.
Interestingly the O3 optimization is very slightly, but measurably slower than O2. And O2 is around 5% faster than O0
The gain of 5% seems a little disappointing compared to metrics reported for other programs but I would expect to see to see greater gains for PHP code implementing (for example) encryption and image processing.