Wednesday, 29 May 2013

Starting a new website - part 2

The other tool I looked at was OpenOutreach

This is a bundle of Drupal and various widgets. Although the result is something which is well integrated, and with a huge amount of functionality, once again this comes at a horrendous performance cost. Even browsing through a couple of pages, cached serverside, with APC enabled (and sized to hold all the code) via http://localhost/, it was taking a visible 3 seconds to load the page. Spookily the openoutreach site (120 ms RTT) actually loaded faster on my machine than the local version - suggesting that maybe there was scope for a lot more tuning - particularly running a caching proxy in front (I was using a fairly stock Apache 2.2 pre-fork). And once again, cron jobs were a requirement.

It's not that I have anything against cron - and plenty of hosting packages come with cron support these days. It just begs the question of why did the people who wrote what is intended to be an off-the-shelf package not make so little attempt to minimise the dependencies? Why did they think that cron that was the easiest way to run a job? Do I have time to monitor the cron jobs and make sure they are working properly? But more importantly if they've chosen a quick fix solution here, what other corners have been cut?

The more I read about performance an usability the more I keep coming back to the magic 1 second. While there continues to be measurable productivity improvements down to around 0.3 seconds, and Google, Amazon and others have published studies showing that there is a measurable impact at lower time intervals, 1 second seems to standout as a massive milestone in page loading times - it is possible to get significantly faster page loads than this - but this seems to be a breakpoint of diminishing returns.

Sorry Drupal / Openoutreach, maybe not a fail, but B minus.

At this point I could have started looking round the many, many other open-source PHP CMS available. But I decided that I'd already spent too much time on this and went back to good old Dokuwiki. Dokuwiki does have issues - being file based it's difficult to scale to very high volumes of content / users and tricky to scale beyond a single server (the simplest way to this is to run additional nodes are read-only and replicate content updates from a single master node - since it's file based, this is easy to setup). However for my purposes scalability is not going to be an issue - the core system is very fast. The more recent versions have slowed down a bit at the front end due to the addition of jquery - this and all the javascript is loaded at the top of each page - leading to a rather ugly 200-350 millisecond pause in the screen rendering even loading from cache. However the template/theme for the application is contained in a plugin (there's lot's of themes available) which is a relatively simple bit of PHP code.

Unfortunately Dokuwiki bundles all the content to be written into the section together in tpl_metaheaders() - so although I was able to get the scripts at the bottom, it needed (small) changes to the core Dokuwiki source code.

With the core system (version Weatherwax) making these changes and moving the scripts to the bottom seemed to have no adverse effect on the functionality of the system. Unfortunately it does break a lot of existing plugins. This also exposed another issue with this approach. A common feature with many CMS, particularly those using javascript capable plugins is that they tend to embed javascript within the HTML of the page. In addition to the problem that this usually expects a library to already have been loaded, it also makes it impossible to implement a Content Security Policy banning inline scripts (a very simple and effective way to protect against XSS attacks).

Programming, like crack cocaine, is a rather hard habit to break - so I immediately started thinking about ways to solve this. The inline code usually just does 2 things:

  • declares the data to be used by the rendering script
  • invokes the rendering code
The data of course doesn't need to be inside a script tag - it just needs to be somewhere the script can read it. As for rendering the code - a simple DOM parser could be invoked after the javascripts were loaded and parsed to invoke the renderers. This could cause re-flows and re-paints - but by using block elements it need not be disruptive (certainly a lot less disruptive than stopping the initial rendering of the page to process the scripts.

So, first problem, where to put the data. 

It's trivial to add custom properties to the DOM (although there are some reports of issues in MSIE prior to version 8) I've seen successful implementations running in version 6 and 7. To be strictly correct you should declare in these in a custom DTD (as Facebook do with FBML) and not doing so can result in the browser rendering in quirks mode. But a bit more reading revealed another great feature in HTML5: you can add custom attributes to nodes without a custom DTD as long as the name is prefixed with 'data-' and the remainder is all in lower case. This is backwardly compatible with older browsers and avoids having to write a custom DTD. Some browsers will still be forced into quirks mode - but since I've spent rather a lot of time structuring the CSS and HTML to be responsive this should not be a big problem. Yeah!

I was about to start writing the code to implement the scanner, when I stopped and thought about this. I was creating a javascript framework. In order for it to have any value, I'd have to re-write the plugins I wanted to use and ideally get other plugin writers to join my bandwagon. And there was also a risk that the core developers of Dokuwiki might introduce new features that broke when the scripts were at the bottom.

A quick email to the Dokuwiki developers list showed that the issue of scripts at the bottom had already been raised, considered and binned. If this was the road to go down, then I'd have to spend some time writing nice well behaved and integrated code - and that would also mean taking time to learn a lot more about Dokuwiki internals.

A further issue with this approach is that I'd still have to refactor any plugins I wanted to use to fit into this new framework - once again I was losing sight of the main objective here - plug and play!

Was there another way to solve the problem? Yes! although it's not backwardly compatible, HTML5's pushstate allows you to load content using ajax, and set the location in the browser without transitioning the entire page. There's a really nice implementation of this in the pages on Github - which degrades nicely on older browsers. And the source code / documentation for PJAX is available (from GitHub of course). 

Now, all the custom modifications could go into the template rather than the core code (and it's only about 4 lines of code). The only caveat is that I don't have a browser which supports it on any computers at home so I'll need to do some upgrading before I can try it out for real.

I will try to go back to looking at scripts-at-the-bottom in Dokuwiki, but for now I've got a working solution that performs well and provides a base for some nice eye-candy effects.

No comments:

Post a Comment