Having previously failed to get a significant difference in benchmarks between PHP 5.3 and 5.5, I was successful in establishing that the optimizer was producing code which ran slightly faster (about 2% with DokuWiki).
This time I tried ramping up the concurrency to see if PHP could deliver on its performance promises.
Running
ab -n 12000 -c $X http://localhost/src/doku.php
For $X in [10,50,100,200,300,400,500,600,700,800] I got....
For less than 500 concurrent connections, PHP 5.3.3 is slower - but only about 5%. At more than 500 concurrent requests, 5.5.1 is slower!
I'm just no good at this benchmarking business.
Both were running from nginx+php-fpm with the same (common) config. 5.5.1 had full ZOP+ optimization enabled.
Monday, 16 September 2013
Wednesday, 4 September 2013
Zend Optimizer Plus - trying to do it right
Reading further (didn't bookmark the link and can't find it now) alongside benchmarks showing ZOP+ to be around 20% faster for "real world" applications there's also mention of a big reduction in memory usage.
I'm quite prepared to believe that better use of memory is possible - the runtime footprint of PHP code seems to be around 8 times the footprint of the script on disk - so plenty of scope for improvement. I was running my tests with 100 concurrent connections - not nearly saturating my machine. I expect that running with a much higher load / less memory would translate into the the performance improvements reported elsewhere - more testing required?
Meanwhile I had another look at the optimizer. I repeated the setup from last time, running with PHP 5.5.1, ZOP+ with filemtime checking off, fetching a single Dokuwiki page. The control test with full optimization - as per previous run - is giving slightly different results than last time. Since I'm running this on my home machine and it's also running lots of other things like X, KDE, my browser, mail client... it's possible that the system isn't in exactly the state it was in when I ran the previous tests.
I'm quite prepared to believe that better use of memory is possible - the runtime footprint of PHP code seems to be around 8 times the footprint of the script on disk - so plenty of scope for improvement. I was running my tests with 100 concurrent connections - not nearly saturating my machine. I expect that running with a much higher load / less memory would translate into the the performance improvements reported elsewhere - more testing required?
Meanwhile I had another look at the optimizer. I repeated the setup from last time, running with PHP 5.5.1, ZOP+ with filemtime checking off, fetching a single Dokuwiki page. The control test with full optimization - as per previous run - is giving slightly different results than last time. Since I'm running this on my home machine and it's also running lots of other things like X, KDE, my browser, mail client... it's possible that the system isn't in exactly the state it was in when I ran the previous tests.
Full Optimization:
opcache.optimization_level=0xffffffff6.334 ms/req
Optimization disabled :
opcache.optimization_level=06.452 ms/req
Repeating the test several times gave consistent results; about 2% improvement in speed.
Not revolutionary, but every little helps - and in fairness Dokuwiki already has good performance optimization in the PHP code.
When time allows I'll go back and look at memory / CPU usae while running ZOP+ vs APC
Tuesday, 27 August 2013
Doing it wrong again - this time with Zend Optimizer plus
The guys at PHP have now committed to shipping Zend Optimizer Plus with future releases of PHP, I thought I'd have a play around with it.
tl;dr
While normally I run my PHP from Apache + mod_php, for the purposes of this exercise, it was easier for me to set up nginx / php-fpm using PHP 5.3.3/APC 3.1.2 and 5.5.1/Zend Opcache 7.0.2dev. All were compiled from source using default settings on a 32-bit PCLinuxOS 2012 distribution (kernel 2.6.38.8) on a dual AMD 4200 machine with 2Gb of memory.
Tests were run using ab on localhost, each test was run by first seeding the opcocde cache (ab -n 2 -c 2) then taking a measurement (ab -n 1000 -c 100).
For both APC and ZOP+ tests were run with an without checking for modified timestamps on PHP source files.
For both APC and ZOP+ tests were run with an without checking for modified timestamps on PHP source files.
Final scores
My results showed Zend Optimizer Plus to be no faster than APC. Although the numbers are within the margin of error, if anything ZOP+ was slower.
Config | Application | Timestamps | time per request (ms) |
Php 5.3.3 + APC | Dokuwiki | yes | 6.274 |
Php 5.3.3 + APC | Dokuwiki | no | 6.271 |
Php 5.5.1, ZOP | Dokuwiki | yes | 6.643 |
Php 5.5.1, ZOP | Dokuwiki | no | 6.293 |
Php 5.3.3 + APC + mysql | Wordpress | yes | 36.012 |
Php 5.3.3 + APC + mysql | Wordpress | no | 35.868 |
Php 5.5.1, ZOP, mysqlnd | Wordpress | yes | 35.905 |
Php 5.5.1, ZOP, mysqlnd | Wordpress | no | 35.978 |
static HTML from Dokuwiki | Dokuwiki | n/a | 0.171 |
static HTML from Wordpress | Wordpress | n/a | 0.182 |
This seems to fly in the face of what is currently being reported elsewhere.
More stuff about my methodology
I wanted to create a reasonably realistic test, hence using 2 off the shelf Content Management Systems. Wordpress uses a MySQL backend for its data: and there is a further difference beteween the APC and ZOP+ configurations: the former uses libmysqlclient while the latter was built with MySQLnd (which meant I had to rewrite the database class to use the mysqli_ functions in place of mysql). The effects on performance are complex and tied to the level of concurrency but at 100 concurrent HTTP requests I was expecting this to be minimal. Dokuwiki, on the other hand, uses file based storage.
Other reviews
The PHP wiki page about the change links to a spreadsheet showing what look like impressive stats. The stats are reported as requests per second for various configurations.
Some more reviews:
https://managewp.com/boost-wordpress-performance-zend-optimizer - doesn't show response times / requests/second but does say that load and memory usage were lower implying greater capacity using ZOP+ compared with APC
http://halfelf.org/2013/trading-apc-for-zend/ reports a similar reduction in CPU and memory, but again no response times.
http://www.ricardclau.com/2013/03/apc-vs-zend-optimizer-benchmarks-with-symfony2/ results again in req/s, and showing an improvement of around 10-15%
http://massivescale.blogspot.co.uk/2013/06/php-55-zend-optimiser-opcache-vs-xcache.html compared ZOP+ and Xcache finding approx 15-20% improvement in req/s and similar reduction in response times with Joomla.
The optimizer bit(s)
PHP opcode caches have been around for a long time. ZOP+ brings something new: a code optimizer. Since it is still generating pcode it doesn't apply the CPU specific tweaks that a native code compiler does. Despite the 32 bits integer used to set the optimizer flag, only 6 flags are recognized by the optimizer (and the last pass only cleans up the debris left by the first 5). The optimizations are mostly substitutions - replacing PHP's builtin constants with literals, post increment with pre-increment, compile time type-juggling of built-in constants and such like. There is no inlining of functions in loops. No branch order prediction. Having dug through the code, I was not expecting it the optimizer to deliver revolutionary speed improvements.
And yet, Dimitry's spreadsheet shows 'ZF Test (ZF 1.5)' going from 158 req/s to 217 req/s!
I presume this refers to the Zend Framework. While this far from speedy I find it astonishing that the performance of the code should improve so much with a few relatively simple tweaks to the opcodes - it rather suggests that there is huge scope for optimizing the code by hand. Although I also note that the performance of 'Scrom (ZF App)' only improves by around 8%.
What am I doing wrong?
The consistent difference (apart from the opcode cache) in my experiment was using different versions of PHP - to a certain extent I'm not really comparing like-for-like. I can only assume that if I ran APC against PHP 5.5.1 and/or ZOP+ with PHP 5.3.3 I would see a very different story. However if you are seeking optimal performance at low load levels (rather than optimal capacity) then there seems to be little incentive to apply this upgrade.
There are anecdotal reports of stability issues with APC on PHP 5.4+; there may be sound technical and economic reasons why APC is not being actively maintained and for ZOP+ to be a better strategic choice.
A clear choice?
I can live without APC's support for user-data caching. But the elephant in the room is the fact that ZOP+ does not reclaim memory: if your code base is larger than the cache size, or the cache fills up with old verions of code, it forces a full flush and re-initialization. This should not be a problem for sites with a dedicated devops personnel managing releases to a small number of applications using a continuous deployment strategy. However for the rest of us there needs be a significant performance advantage with ZOP+ to make this a price worth paying.
Labels:
APC,
benchmark,
performance,
PHP,
WPO,
Zend,
Zend Optimizer
Saturday, 17 August 2013
Starting a new website - part 3
So the decision was made, I would stick with Dokuwiki and use PJAX for loading the pages.
A bit of coding and hey presto...www.scottishratclub.co.uk
(live site is sill running an incomplete version of the code - note to self - get the current version deployed)
In order to structure the Javacript changes nicely, keep everything tidy and fit in with Dokuwiki too, the functionality is split across a syntax plugin (for implementing widgets, including initializing PJAX, fixing the problems introduced by defering loading of the javascript and accomodating a strict Content Security Policy). This then places some constraints on how further widgets are implemented so it's really a framework (yeuch!). Anyway the plugin is called Jokuwiki
In order to use PJAX, the source page needs to be slightly modified (but it's JUST 5 LINES OF CODE!):
if ('true'!=$_SERVER['HTTP_X_PJAX']){
....top part of page
} ?><div id="pjax-container">
....stuff loaded via pjax
</div><?php
if ('true'!=$_SERVER['HTTP_X_PJAX']){
A bit of coding and hey presto...www.scottishratclub.co.uk
(live site is sill running an incomplete version of the code - note to self - get the current version deployed)
In order to structure the Javacript changes nicely, keep everything tidy and fit in with Dokuwiki too, the functionality is split across a syntax plugin (for implementing widgets, including initializing PJAX, fixing the problems introduced by defering loading of the javascript and accomodating a strict Content Security Policy). This then places some constraints on how further widgets are implemented so it's really a framework (yeuch!). Anyway the plugin is called Jokuwiki
In order to use PJAX, the source page needs to be slightly modified (but it's JUST 5 LINES OF CODE!):
if ('true'!=$_SERVER['HTTP_X_PJAX']){
....top part of page
} ?><div id="pjax-container">
....stuff loaded via pjax
</div><?php
if ('true'!=$_SERVER['HTTP_X_PJAX']){
....bottom part of page
}
But just to make it real easy, I published a template too. Not the one I used on my website - but if anyone want it....let me know.
The impact of PJAX on performance is rather large:
Of course it had to be deployed to the site. So I dusted down pushsite and fired it up with a recipe for deploying the site. About 50 files loaded then I stopped getting responses from the remote system. I ran it again.....a further 20. The socket was still connected but nothing happening. Switching to passive mode didn't help. Adding throttling didn't help. I spent several hours battling with it and gave up. Same story the following day - so I logged a call with the service provider. The following day, they suggested using a different FTP server.....same problem. They said they'd get back to me.
Since I had no ssh access, I couldn't unpack a tarball from the shell - doing it via a PHP script invoked from the web would have meant I'd have to spend just as much time fixing the permissions as uploading the stuff by hand. But a bit of rummaging around in cpanel and I found that there was a back/restore option running over HTTP - so I download a backup, unpacked it, overwrote the backed-up website with my new site, packed it up and restored it onto the server. Job done.
Thursday, 30 May 2013
Keyboard STILL not detected. Press F1 to continue
I'm very particular about keyboards and mice. I find a huge difference between devices with different feels - for example, I've usually found cherry keyboards to be a bit clicky. I've recently been supplied with a new work laptop which has a 'chiclet' keyboard - which I hate. At one point I acquired a second hand Tandem PC which came with an original IBM AT keyboard. The latter was a phenomenal piece of cold-war engineering, clearly designed to survive at ground zero of a thermo nuclear strike. The keys were like those on a manual typewriter. CLICK - clack - CLICK. You would have expected a bell to ring every time you pressed the return button. I quickly found something else and sold the IBM keyboard at a car boot sale to a lumberjack whom needed a new axe.
Fast forward a number of years....I blogged before about getting a cheapo keyboard from Currys. This had been providing sterling service for the past 2 years, until one day, completely out of the blue, strange things started happening after I logged in. Initially the mouse starting selecting stuff at random, clicking on links in my browser caused new windows to open. So I tried applying the holy mantra of IT - switch-it-off-and-back-on-again. Incorrect password. And again....hang on a minute. Time for applying some advanced systems admin skills. Summoning my computer mojo and focussing 30+ years of hardware and software skills into a finely honed, cutting edge of diagnostic meditation....no - the Caps lock key is definitely off.
Numlock...on....off. Caps lock....on....off. Ctrl-Alt-F2, switch to a text console - good, that works. Now try logging in. ROOT. Who? Shift seems to be inverted. This is not right. I pick up the keyboard. Water pours out. Water? Smoke, yes, I'd expect that from computers, but not water. Did I mention I have kids? I don't know if it was my clumsiness in taking it apart, permanent damage from the water, or I just couldn't get it dry enough but the keyboard was a goner.
Off to Curry's. However the only wired keyboards they have are rather nasty. The ones with good quality mechanical bits are all wireless (every wireless keyboard I've used has been very slow) or ridiculously pricey 'gaming' keyboards with heavy clicks. My search is further complicated by the fat I'm looking for a three-quarters sized keyboard to make better use of my desktop real-estate. Back home and on to Amazon, where I find this which looks like it fits the bill in terms of size, key type and action. Having been so fussy about not getting keyboards because I don't like the feel of them, it's a bit of a gamble to buy one online - but sometimes we all just go a bit crazy!
It arrived today. I was excited. The keys have a nice soft click to them. But it proved to be a bit difficult to get working. This anti-ghosting thing means that it just won't talk to the USB HCI stuff in Linux. I could access and type in the Bios, but after after that nothing.
I plugged back in a spare PS2 keyboard and found this in my logs:
May 30 20:00:57 localhost klogd: usb 2-1: New USB device found, idVendor=060b, idProduct=2231
May 30 20:00:57 localhost klogd: usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
May 30 20:00:57 localhost klogd: usb 2-1: Product: USB Keyboard
May 30 20:00:57 localhost klogd: usb 2-1: Manufacturer: KB
May 30 20:00:57 localhost klogd: input: KB USB Keyboard as /devices/pci0000:00/0000:00:0b.0/usb2/2-1/2-1:1.0/input/input0
May 30 20:00:57 localhost klogd: generic-usb 0003:060B:2231.0001: input,hidraw0: USB HID v1.11 Keyboard [KB USB Keyboard] on usb-0000:00:0b.0-1/input0
May 30 20:00:57 localhost klogd: generic-usb: probe of 0003:060B:2231.0002 failed with error -22
Fast forward a number of years....I blogged before about getting a cheapo keyboard from Currys. This had been providing sterling service for the past 2 years, until one day, completely out of the blue, strange things started happening after I logged in. Initially the mouse starting selecting stuff at random, clicking on links in my browser caused new windows to open. So I tried applying the holy mantra of IT - switch-it-off-and-back-on-again. Incorrect password. And again....hang on a minute. Time for applying some advanced systems admin skills. Summoning my computer mojo and focussing 30+ years of hardware and software skills into a finely honed, cutting edge of diagnostic meditation....no - the Caps lock key is definitely off.
Numlock...on....off. Caps lock....on....off. Ctrl-Alt-F2, switch to a text console - good, that works. Now try logging in. ROOT. Who? Shift seems to be inverted. This is not right. I pick up the keyboard. Water pours out. Water? Smoke, yes, I'd expect that from computers, but not water. Did I mention I have kids? I don't know if it was my clumsiness in taking it apart, permanent damage from the water, or I just couldn't get it dry enough but the keyboard was a goner.
Off to Curry's. However the only wired keyboards they have are rather nasty. The ones with good quality mechanical bits are all wireless (every wireless keyboard I've used has been very slow) or ridiculously pricey 'gaming' keyboards with heavy clicks. My search is further complicated by the fat I'm looking for a three-quarters sized keyboard to make better use of my desktop real-estate. Back home and on to Amazon, where I find this which looks like it fits the bill in terms of size, key type and action. Having been so fussy about not getting keyboards because I don't like the feel of them, it's a bit of a gamble to buy one online - but sometimes we all just go a bit crazy!
It arrived today. I was excited. The keys have a nice soft click to them. But it proved to be a bit difficult to get working. This anti-ghosting thing means that it just won't talk to the USB HCI stuff in Linux. I could access and type in the Bios, but after after that nothing.
I plugged back in a spare PS2 keyboard and found this in my logs:
May 30 20:00:57 localhost klogd: usb 2-1: New USB device found, idVendor=060b, idProduct=2231
May 30 20:00:57 localhost klogd: usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
May 30 20:00:57 localhost klogd: usb 2-1: Product: USB Keyboard
May 30 20:00:57 localhost klogd: usb 2-1: Manufacturer: KB
May 30 20:00:57 localhost klogd: input: KB USB Keyboard as /devices/pci0000:00/0000:00:0b.0/usb2/2-1/2-1:1.0/input/input0
May 30 20:00:57 localhost klogd: generic-usb 0003:060B:2231.0001: input,hidraw0: USB HID v1.11 Keyboard [KB USB Keyboard] on usb-0000:00:0b.0-1/input0
May 30 20:00:57 localhost klogd: generic-usb: probe of 0003:060B:2231.0002 failed with error -22
lsusb said this:
Bus 002 Device 002: ID 060b:2231 Solid Year Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 2.00 bDeviceClass 0 (Defined at Interface level) bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize0 8 idVendor 0x060b Solid Year idProduct 0x2231 bcdDevice 2.21 iManufacturer 1 KB iProduct 2 USB Keyboard iSerial 0 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 59 bNumInterfaces 2 bConfigurationValue 1 iConfiguration 0 bmAttributes 0xa0 (Bus Powered) Remote Wakeup MaxPower 100mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 3 Human Interface Device bInterfaceSubClass 1 Boot Interface Subclass bInterfaceProtocol 1 Keyboard iInterface 0 ** UNRECOGNIZED: 09 21 11 01 00 01 22 4b 00 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0008 1x 8 bytes bInterval 10 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 1 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 3 Human Interface Device bInterfaceSubClass 0 No Subclass bInterfaceProtocol 0 None iInterface 0 ** UNRECOGNIZED: 09 21 11 01 00 01 22 6c 00 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x82 EP 2 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0008 1x 8 bytes bInterval 10 Device Status: 0x0000 (Bus Powered)
A bit of Googling and I find that the problem is caused by the Anti-Ghosting support in the keyboard (allows you to press lots of keys simultaeneously). Pressing Ctrl + Alt + Shift (all bottom left) + N disables this and the keyboard is usable with Linux (no mention of this in the manual which came with it of course).
Rather than have to remember the weird key combination I've ordered a PS2 adapter which I've read should solve the problem.
(and along the way I find *MORE* bugs in Blogger's WYSIWYG editor. Sigh)
Wednesday, 29 May 2013
Starting a new website - part 2
The other tool I looked at was OpenOutreach
This is a bundle of Drupal and various widgets. Although the result is something which is well integrated, and with a huge amount of functionality, once again this comes at a horrendous performance cost. Even browsing through a couple of pages, cached serverside, with APC enabled (and sized to hold all the code) via http://localhost/, it was taking a visible 3 seconds to load the page. Spookily the openoutreach site (120 ms RTT) actually loaded faster on my machine than the local version - suggesting that maybe there was scope for a lot more tuning - particularly running a caching proxy in front (I was using a fairly stock Apache 2.2 pre-fork). And once again, cron jobs were a requirement.
It's not that I have anything against cron - and plenty of hosting packages come with cron support these days. It just begs the question of why did the people who wrote what is intended to be an off-the-shelf package not make so little attempt to minimise the dependencies? Why did they think that cron that was the easiest way to run a job? Do I have time to monitor the cron jobs and make sure they are working properly? But more importantly if they've chosen a quick fix solution here, what other corners have been cut?
The more I read about performance an usability the more I keep coming back to the magic 1 second. While there continues to be measurable productivity improvements down to around 0.3 seconds, and Google, Amazon and others have published studies showing that there is a measurable impact at lower time intervals, 1 second seems to standout as a massive milestone in page loading times - it is possible to get significantly faster page loads than this - but this seems to be a breakpoint of diminishing returns.
Sorry Drupal / Openoutreach, maybe not a fail, but B minus.
At this point I could have started looking round the many, many other open-source PHP CMS available. But I decided that I'd already spent too much time on this and went back to good old Dokuwiki. Dokuwiki does have issues - being file based it's difficult to scale to very high volumes of content / users and tricky to scale beyond a single server (the simplest way to this is to run additional nodes are read-only and replicate content updates from a single master node - since it's file based, this is easy to setup). However for my purposes scalability is not going to be an issue - the core system is very fast. The more recent versions have slowed down a bit at the front end due to the addition of jquery - this and all the javascript is loaded at the top of each page - leading to a rather ugly 200-350 millisecond pause in the screen rendering even loading from cache. However the template/theme for the application is contained in a plugin (there's lot's of themes available) which is a relatively simple bit of PHP code.
Unfortunately Dokuwiki bundles all the content to be written into the section together in tpl_metaheaders() - so although I was able to get the scripts at the bottom, it needed (small) changes to the core Dokuwiki source code.
With the core system (version Weatherwax) making these changes and moving the scripts to the bottom seemed to have no adverse effect on the functionality of the system. Unfortunately it does break a lot of existing plugins. This also exposed another issue with this approach. A common feature with many CMS, particularly those using javascript capable plugins is that they tend to embed javascript within the HTML of the page. In addition to the problem that this usually expects a library to already have been loaded, it also makes it impossible to implement a Content Security Policy banning inline scripts (a very simple and effective way to protect against XSS attacks).
Programming, like crack cocaine, is a rather hard habit to break - so I immediately started thinking about ways to solve this. The inline code usually just does 2 things:
I will try to go back to looking at scripts-at-the-bottom in Dokuwiki, but for now I've got a working solution that performs well and provides a base for some nice eye-candy effects.
This is a bundle of Drupal and various widgets. Although the result is something which is well integrated, and with a huge amount of functionality, once again this comes at a horrendous performance cost. Even browsing through a couple of pages, cached serverside, with APC enabled (and sized to hold all the code) via http://localhost/, it was taking a visible 3 seconds to load the page. Spookily the openoutreach site (120 ms RTT) actually loaded faster on my machine than the local version - suggesting that maybe there was scope for a lot more tuning - particularly running a caching proxy in front (I was using a fairly stock Apache 2.2 pre-fork). And once again, cron jobs were a requirement.
It's not that I have anything against cron - and plenty of hosting packages come with cron support these days. It just begs the question of why did the people who wrote what is intended to be an off-the-shelf package not make so little attempt to minimise the dependencies? Why did they think that cron that was the easiest way to run a job? Do I have time to monitor the cron jobs and make sure they are working properly? But more importantly if they've chosen a quick fix solution here, what other corners have been cut?
The more I read about performance an usability the more I keep coming back to the magic 1 second. While there continues to be measurable productivity improvements down to around 0.3 seconds, and Google, Amazon and others have published studies showing that there is a measurable impact at lower time intervals, 1 second seems to standout as a massive milestone in page loading times - it is possible to get significantly faster page loads than this - but this seems to be a breakpoint of diminishing returns.
Sorry Drupal / Openoutreach, maybe not a fail, but B minus.
At this point I could have started looking round the many, many other open-source PHP CMS available. But I decided that I'd already spent too much time on this and went back to good old Dokuwiki. Dokuwiki does have issues - being file based it's difficult to scale to very high volumes of content / users and tricky to scale beyond a single server (the simplest way to this is to run additional nodes are read-only and replicate content updates from a single master node - since it's file based, this is easy to setup). However for my purposes scalability is not going to be an issue - the core system is very fast. The more recent versions have slowed down a bit at the front end due to the addition of jquery - this and all the javascript is loaded at the top of each page - leading to a rather ugly 200-350 millisecond pause in the screen rendering even loading from cache. However the template/theme for the application is contained in a plugin (there's lot's of themes available) which is a relatively simple bit of PHP code.
Unfortunately Dokuwiki bundles all the content to be written into the section together in tpl_metaheaders() - so although I was able to get the scripts at the bottom, it needed (small) changes to the core Dokuwiki source code.
With the core system (version Weatherwax) making these changes and moving the scripts to the bottom seemed to have no adverse effect on the functionality of the system. Unfortunately it does break a lot of existing plugins. This also exposed another issue with this approach. A common feature with many CMS, particularly those using javascript capable plugins is that they tend to embed javascript within the HTML of the page. In addition to the problem that this usually expects a library to already have been loaded, it also makes it impossible to implement a Content Security Policy banning inline scripts (a very simple and effective way to protect against XSS attacks).
Programming, like crack cocaine, is a rather hard habit to break - so I immediately started thinking about ways to solve this. The inline code usually just does 2 things:
- declares the data to be used by the rendering script
- invokes the rendering code
The data of course doesn't need to be inside a script tag - it just needs to be somewhere the script can read it. As for rendering the code - a simple DOM parser could be invoked after the javascripts were loaded and parsed to invoke the renderers. This could cause re-flows and re-paints - but by using block elements it need not be disruptive (certainly a lot less disruptive than stopping the initial rendering of the page to process the scripts.
So, first problem, where to put the data.
It's trivial to add custom properties to the DOM (although there are some reports of issues in MSIE prior to version 8) I've seen successful implementations running in version 6 and 7. To be strictly correct you should declare in these in a custom DTD (as Facebook do with FBML) and not doing so can result in the browser rendering in quirks mode. But a bit more reading revealed another great feature in HTML5: you can add custom attributes to nodes without a custom DTD as long as the name is prefixed with 'data-' and the remainder is all in lower case. This is backwardly compatible with older browsers and avoids having to write a custom DTD. Some browsers will still be forced into quirks mode - but since I've spent rather a lot of time structuring the CSS and HTML to be responsive this should not be a big problem. Yeah!
I was about to start writing the code to implement the scanner, when I stopped and thought about this. I was creating a javascript framework. In order for it to have any value, I'd have to re-write the plugins I wanted to use and ideally get other plugin writers to join my bandwagon. And there was also a risk that the core developers of Dokuwiki might introduce new features that broke when the scripts were at the bottom.
A quick email to the Dokuwiki developers list showed that the issue of scripts at the bottom had already been raised, considered and binned. If this was the road to go down, then I'd have to spend some time writing nice well behaved and integrated code - and that would also mean taking time to learn a lot more about Dokuwiki internals.
A further issue with this approach is that I'd still have to refactor any plugins I wanted to use to fit into this new framework - once again I was losing sight of the main objective here - plug and play!
Was there another way to solve the problem? Yes! although it's not backwardly compatible, HTML5's pushstate allows you to load content using ajax, and set the location in the browser without transitioning the entire page. There's a really nice implementation of this in the pages on Github - which degrades nicely on older browsers. And the source code / documentation for PJAX is available (from GitHub of course).
Now, all the custom modifications could go into the template rather than the core code (and it's only about 4 lines of code). The only caveat is that I don't have a browser which supports it on any computers at home so I'll need to do some upgrading before I can try it out for real.
I will try to go back to looking at scripts-at-the-bottom in Dokuwiki, but for now I've got a working solution that performs well and provides a base for some nice eye-candy effects.
Labels:
CMS,
Dokuwiki,
DOM,
Drupal,
HTML5,
javascript,
openoutreach,
PHP
Monday, 6 May 2013
Starting a new website
I've been volunteered as the webmaster for a club my daughter is involved with. Currently they have a rather plain static html website and a free Invision Power Board forum. Increasingly the members are using Facebook as a means of communicating. This is far from ideal - the discussions and postings add a lot of value to the club and they are being given away gratis on Facebook. Further given the imminent adoption of the Instagram act, I felt it's important to maintain additional controls over users copyright.
I'm big fan of free software and data - but in this case, the members are expecting a bundle of additional products, services and discounts. While it's a non-profit organization with a very small turnover there are significant costs involved. Hence I felt it was a good idea to provide a more realistic alternative to Facebook on the Club site, including a members only area.
The first problem was the hosting. The site was using the starter package from HeartInternet. Although it was PHP enabled, the amount of storage was very small - allowing users to upload their own content would have filled the quota very quickly, and there was no database support. Upgrading the account to support the expected usage would have been exorbitantly expensive. So the first task was to find a hosting company.
While I can find lots of very cheap hosting companies claiming to operate in the UK - nearly all of them have their datacentres in the US/Canada. It's not that I've got anything against the former colonies - but I already have a site hosted in Florida, and although packets go back and forth at around half the speed of light (I measured it - how sad) the latency is a killer. I don't know if there's a paucity of UK service providers or if they're just not as good at SEO as the ones across the Atlantic, but this proved rather hard. There's also a large number of dummy review sites out there. But it did confirm that the prices on offer from HeartInternet were rather expensive. www.hostpapa.co.uk claim to offer UK web hosting with unlimited bandwidth, unlimited storage along with the usual PHP/MySQL combo for under £2 a month! It's only when you dig a little deeper you find out they have no UK Data Centres. I emailed them to ask if they could provide hosting in the UK - their reply: "for securit [sic] purpose we do not disclose the locations of our data centers".
Sorry hostpapa: FAIL
So, for the record, I did manage to track down jomongee, x9internet, 123reg (whom I already use for DNS) and Ronald MacDonald (I'm guessing no relation to the burger chain). Ronald is offering virtual servers at very low prices - although I'm a bit of a control freak, and am confident I could setup the server at least as well as most ISPs, I just don't have the time to do this - I'll just stick with a basic shared host. Notably none of these companies plaster their front pages with bogus awards and certificates. I'll probably go with x9internet - whom offer a reasonable hosting package, but what really swung it for me was the refreshingly honest way they appear to do business.
So that's the hosting side sorted out. I just need some software to run it all. Ideally the system would provide:
I'm big fan of free software and data - but in this case, the members are expecting a bundle of additional products, services and discounts. While it's a non-profit organization with a very small turnover there are significant costs involved. Hence I felt it was a good idea to provide a more realistic alternative to Facebook on the Club site, including a members only area.
The first problem was the hosting. The site was using the starter package from HeartInternet. Although it was PHP enabled, the amount of storage was very small - allowing users to upload their own content would have filled the quota very quickly, and there was no database support. Upgrading the account to support the expected usage would have been exorbitantly expensive. So the first task was to find a hosting company.
While I can find lots of very cheap hosting companies claiming to operate in the UK - nearly all of them have their datacentres in the US/Canada. It's not that I've got anything against the former colonies - but I already have a site hosted in Florida, and although packets go back and forth at around half the speed of light (I measured it - how sad) the latency is a killer. I don't know if there's a paucity of UK service providers or if they're just not as good at SEO as the ones across the Atlantic, but this proved rather hard. There's also a large number of dummy review sites out there. But it did confirm that the prices on offer from HeartInternet were rather expensive. www.hostpapa.co.uk claim to offer UK web hosting with unlimited bandwidth, unlimited storage along with the usual PHP/MySQL combo for under £2 a month! It's only when you dig a little deeper you find out they have no UK Data Centres. I emailed them to ask if they could provide hosting in the UK - their reply: "for securit [sic] purpose we do not disclose the locations of our data centers".
Sorry hostpapa: FAIL
So, for the record, I did manage to track down jomongee, x9internet, 123reg (whom I already use for DNS) and Ronald MacDonald (I'm guessing no relation to the burger chain). Ronald is offering virtual servers at very low prices - although I'm a bit of a control freak, and am confident I could setup the server at least as well as most ISPs, I just don't have the time to do this - I'll just stick with a basic shared host. Notably none of these companies plaster their front pages with bogus awards and certificates. I'll probably go with x9internet - whom offer a reasonable hosting package, but what really swung it for me was the refreshingly honest way they appear to do business.
So that's the hosting side sorted out. I just need some software to run it all. Ideally the system would provide:
- a CMS for publishing web pages - basic navigation, user authentication, public and member only pages
- a forum where users could vote up/down posts and acquire points (like stack overflow)
- WYSIWYG editor for forum
- image uploads on forum posts
- billing/payment for membership via Paypal or similar
Looking around the interent, I chose Pligg, Anahita, OpenOutReach and Oxwall to look at in more detail.
I decided to start with Oxwall because it was (IMHO) by far the prettiest. However....
1. Finding the requirements for the software (even what platform it runs on) from the website was very difficult. There's next to no technical documentation at all.
2. The requirements which are provided (in the zip file - no tarball) were a cause for concern
- does not work with Suhosin
- requires a cron job to be run *every* minute
3. The requirements were incomplete - the package relies heavily on mod_rewrite
4. The installation instructions don't work - assuming that the installer is suppoed to amend the URL in "Go to http://www.mysite.com/install to run the install script" appropriately, I tried
http://localhost/mypath/install to get a 404 page
5. After running the installation script, the documentation goes on to tell the installer to run 'chmod 777' on all the directories created by the installation.a !!!!
6. Allowing the Oxwall .htaccess files to do anything they want:
Options -All -Multiviews
AllowOverride All
Order deny,allow
Deny from all
...Some progress - I got access forbidden (403) instead of 404 from the install URL.
7. The .htaccess file in the root directory contains:
Options +FollowSymLinks
RewriteEngine On
AddEncoding gzip .gz
AddEncoding gzip .gzip
ForceType text/javascript
/FilesMatch>
ForceType text/css
RewriteCond %{REQUEST_URI} !^/index\.php
RewriteCond %{REQUEST_URI} !/ow_updates/index\.php
RewriteCond %{REQUEST_URI} !/ow_updates/
RewriteCond %{REQUEST_URI} !/ow_cron/run\.php
RewriteCond %{REQUEST_URI} (/|\.php|\.html|\.htm|\.xml|\.feed|robots\.txt|\.raw|/[^.]*)$ [NC]
RewriteRule (.*) index.php
....i.e. it's not going to work unless it's installed in the root directory
There's also a lot of other things which are, at best strange here.
Creating a new vhost and moving the files into the DocumentRoot still resulted in a 403 error
8. After moving my current /var/www/html elsewhere and moving the Oxwall files back into the default webroot, I still got a 403 error at http://localhost/install
9. Pointing my browser at http://localhost/index.php, I finally got some output from Oxwall! It told me "Your hosting account doesn't meet the following requirements: zip PHP extension not installed"
- yes, another undocumented dependency.
10. installed php-zip, and got a configuration form (although the absence of any styling was a hint that the URL rewriting still wasn't working properly)
I know getting web paths sorted out is not easy - but I hate front controllers. But trying to create an arbitrary namespace using mod_rewrite is just asking for trouble. (BTW the Oxwall wiki runs on Dokuwiki - which I've written about before and is very cool).
While I could probably fix the problems and get a working site together (probably even fix the cron silliness) it's just not worth the effort - that the software packaging has been so sloppy, means that there are probably lots more gremlins in the code - I do not want the site pwned by the first script kiddy to come along.
It's a shame that someone has worked so hard to produce something which looks so nice and appears to have a lot of functionality in it, but makes so many basic errors.
It's a shame that someone has worked so hard to produce something which looks so nice and appears to have a lot of functionality in it, but makes so many basic errors.
Sorry Oxwall: FAIL
Subscribe to:
Posts (Atom)