Saturday, 17 August 2013

Starting a new website - part 3

So the decision was made, I would stick with Dokuwiki and use PJAX for loading the pages.

A bit of coding and hey presto...www.scottishratclub.co.uk

(live site is sill running an incomplete version of the code - note to self - get the current version deployed)

In order to structure the Javacript changes nicely, keep everything tidy and fit in with Dokuwiki too, the functionality is split across a syntax plugin (for implementing widgets, including initializing PJAX, fixing the problems introduced by defering loading of the javascript and accomodating a strict Content Security Policy). This then places some constraints on how further widgets are implemented so it's really a framework (yeuch!). Anyway the plugin is called Jokuwiki

In order to use PJAX, the source page needs to be slightly modified (but it's JUST 5 LINES OF CODE!):


if ('true'!=$_SERVER['HTTP_X_PJAX']){
....top part of page
} ?><div id="pjax-container">
....stuff loaded via pjax
</div><?php
if ('true'!=$_SERVER['HTTP_X_PJAX']){
....bottom part of page
}

But just to make it real easy, I published a template too. Not the one I used on my website - but if anyone want it....let me know.

The impact of PJAX on performance is rather large:


Of course it had to be deployed to the site. So I dusted down pushsite and fired it up with a recipe for deploying the site. About 50 files loaded then I stopped getting responses from the remote system. I ran it again.....a further 20. The socket was still connected but nothing happening. Switching to passive mode didn't help. Adding throttling didn't help. I spent several hours battling with it and gave up. Same story the following day - so I logged a call with the service provider. The following day, they suggested using a different FTP server.....same problem. They said they'd get back to me.

Since I had no ssh access, I couldn't unpack a tarball from the shell - doing it via a PHP script invoked from the web would have meant I'd have to spend just as much time fixing the permissions as uploading the stuff by hand. But a bit of rummaging around in cpanel and I found that there was a back/restore option running over HTTP - so I download a backup, unpacked it, overwrote the backed-up website with my new site, packed it up and restored it onto the server. Job done.

Thursday, 30 May 2013

Keyboard STILL not detected. Press F1 to continue

I'm very particular about keyboards and mice. I find a huge difference between devices with different feels - for example,  I've usually found cherry keyboards to be a bit clicky. I've recently been supplied with a new work laptop which has a 'chiclet' keyboard - which I hate. At one point I acquired a second hand Tandem PC which came with an original IBM AT keyboard. The latter was a phenomenal piece of cold-war engineering, clearly designed to survive at ground zero of a thermo nuclear strike. The keys were like those on a manual typewriter. CLICK - clack - CLICK. You would have expected a bell to ring every time you pressed the return button. I quickly found something else and sold the IBM keyboard at a car boot sale to a lumberjack whom needed a new axe.

Fast forward a number of years....I blogged before about getting a cheapo keyboard from Currys. This had been providing sterling service for the past 2 years, until one day, completely out of the blue, strange things started happening after I logged in. Initially the mouse starting selecting stuff at random, clicking on links in my browser caused new windows to open. So I tried applying the holy mantra of IT - switch-it-off-and-back-on-again. Incorrect password. And again....hang on a minute. Time for applying some advanced systems admin skills. Summoning my computer mojo and focussing 30+ years of hardware and software skills into a finely honed, cutting edge of diagnostic meditation....no - the Caps lock key is definitely off.

Numlock...on....off. Caps lock....on....off. Ctrl-Alt-F2, switch to a text console - good, that works. Now try logging in. ROOT. Who? Shift seems to be inverted. This is not right. I pick up the keyboard. Water pours out. Water? Smoke, yes, I'd expect that from computers, but not water. Did I mention I have kids?  I don't know if it was my clumsiness in taking it apart, permanent damage from the water, or I just couldn't get it dry enough but the keyboard was a goner.

Off to Curry's. However the only wired keyboards they have are rather nasty. The ones with good quality mechanical bits are all wireless (every wireless keyboard I've used has been very slow) or ridiculously pricey 'gaming' keyboards with heavy clicks. My search is further complicated by the fat I'm looking for a three-quarters sized keyboard to make better use of my desktop real-estate. Back home and on to Amazon, where I find this which looks like it fits the bill in terms of size, key type and action. Having been so fussy about not getting keyboards because I don't like the feel of them, it's a bit of a gamble to buy one online - but sometimes we all just go a bit crazy!

It arrived today. I was excited. The keys have a nice soft click to them. But it proved to be a bit difficult to get working. This anti-ghosting thing means that it just won't talk to the USB HCI stuff in Linux. I could access and type in the Bios, but after after that nothing.

I plugged back in a spare PS2 keyboard and found this in my logs:


May 30 20:00:57 localhost klogd: usb 2-1: New USB device found, idVendor=060b, idProduct=2231
May 30 20:00:57 localhost klogd: usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
May 30 20:00:57 localhost klogd: usb 2-1: Product: USB Keyboard
May 30 20:00:57 localhost klogd: usb 2-1: Manufacturer: KB
May 30 20:00:57 localhost klogd: input: KB USB Keyboard as /devices/pci0000:00/0000:00:0b.0/usb2/2-1/2-1:1.0/input/input0
May 30 20:00:57 localhost klogd: generic-usb 0003:060B:2231.0001: input,hidraw0: USB HID v1.11 Keyboard [KB USB Keyboard] on usb-0000:00:0b.0-1/input0
May 30 20:00:57 localhost klogd: generic-usb: probe of 0003:060B:2231.0002 failed with error -22

lsusb said this:

Bus 002 Device 002: ID 060b:2231 Solid Year
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               2.00
  bDeviceClass            0 (Defined at Interface level)
  bDeviceSubClass         0
  bDeviceProtocol         0
  bMaxPacketSize0         8
  idVendor           0x060b Solid Year
  idProduct          0x2231
  bcdDevice            2.21
  iManufacturer           1 KB
  iProduct                2 USB Keyboard
  iSerial                 0
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength           59
    bNumInterfaces          2
    bConfigurationValue     1
    iConfiguration          0
    bmAttributes         0xa0
      (Bus Powered)
      Remote Wakeup
    MaxPower              100mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           1
      bInterfaceClass         3 Human Interface Device
      bInterfaceSubClass      1 Boot Interface Subclass
      bInterfaceProtocol      1 Keyboard
      iInterface              0
      ** UNRECOGNIZED:  09 21 11 01 00 01 22 4b 00
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0008  1x 8 bytes
        bInterval              10
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        1
      bAlternateSetting       0
      bNumEndpoints           1
      bInterfaceClass         3 Human Interface Device
      bInterfaceSubClass      0 No Subclass
      bInterfaceProtocol      0 None
      iInterface              0
      ** UNRECOGNIZED:  09 21 11 01 00 01 22 6c 00
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x82  EP 2 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0008  1x 8 bytes
        bInterval              10
Device Status:     0x0000
  (Bus Powered)

A bit of Googling and I find that the problem is caused by the Anti-Ghosting support in the keyboard (allows you to press lots of keys simultaeneously). Pressing Ctrl + Alt + Shift (all bottom left) + N disables this and the keyboard is usable with Linux (no mention of this in the manual which came with it of course).

Rather than have to remember the weird key combination I've ordered a PS2 adapter which I've read should solve the problem.

(and along the way I find *MORE* bugs in Blogger's WYSIWYG editor. Sigh)




Wednesday, 29 May 2013

Starting a new website - part 2

The other tool I looked at was OpenOutreach

This is a bundle of Drupal and various widgets. Although the result is something which is well integrated, and with a huge amount of functionality, once again this comes at a horrendous performance cost. Even browsing through a couple of pages, cached serverside, with APC enabled (and sized to hold all the code) via http://localhost/, it was taking a visible 3 seconds to load the page. Spookily the openoutreach site (120 ms RTT) actually loaded faster on my machine than the local version - suggesting that maybe there was scope for a lot more tuning - particularly running a caching proxy in front (I was using a fairly stock Apache 2.2 pre-fork). And once again, cron jobs were a requirement.

It's not that I have anything against cron - and plenty of hosting packages come with cron support these days. It just begs the question of why did the people who wrote what is intended to be an off-the-shelf package not make so little attempt to minimise the dependencies? Why did they think that cron that was the easiest way to run a job? Do I have time to monitor the cron jobs and make sure they are working properly? But more importantly if they've chosen a quick fix solution here, what other corners have been cut?

The more I read about performance an usability the more I keep coming back to the magic 1 second. While there continues to be measurable productivity improvements down to around 0.3 seconds, and Google, Amazon and others have published studies showing that there is a measurable impact at lower time intervals, 1 second seems to standout as a massive milestone in page loading times - it is possible to get significantly faster page loads than this - but this seems to be a breakpoint of diminishing returns.

Sorry Drupal / Openoutreach, maybe not a fail, but B minus.

At this point I could have started looking round the many, many other open-source PHP CMS available. But I decided that I'd already spent too much time on this and went back to good old Dokuwiki. Dokuwiki does have issues - being file based it's difficult to scale to very high volumes of content / users and tricky to scale beyond a single server (the simplest way to this is to run additional nodes are read-only and replicate content updates from a single master node - since it's file based, this is easy to setup). However for my purposes scalability is not going to be an issue - the core system is very fast. The more recent versions have slowed down a bit at the front end due to the addition of jquery - this and all the javascript is loaded at the top of each page - leading to a rather ugly 200-350 millisecond pause in the screen rendering even loading from cache. However the template/theme for the application is contained in a plugin (there's lot's of themes available) which is a relatively simple bit of PHP code.

Unfortunately Dokuwiki bundles all the content to be written into the section together in tpl_metaheaders() - so although I was able to get the scripts at the bottom, it needed (small) changes to the core Dokuwiki source code.

With the core system (version Weatherwax) making these changes and moving the scripts to the bottom seemed to have no adverse effect on the functionality of the system. Unfortunately it does break a lot of existing plugins. This also exposed another issue with this approach. A common feature with many CMS, particularly those using javascript capable plugins is that they tend to embed javascript within the HTML of the page. In addition to the problem that this usually expects a library to already have been loaded, it also makes it impossible to implement a Content Security Policy banning inline scripts (a very simple and effective way to protect against XSS attacks).

Programming, like crack cocaine, is a rather hard habit to break - so I immediately started thinking about ways to solve this. The inline code usually just does 2 things:

  • declares the data to be used by the rendering script
  • invokes the rendering code
The data of course doesn't need to be inside a script tag - it just needs to be somewhere the script can read it. As for rendering the code - a simple DOM parser could be invoked after the javascripts were loaded and parsed to invoke the renderers. This could cause re-flows and re-paints - but by using block elements it need not be disruptive (certainly a lot less disruptive than stopping the initial rendering of the page to process the scripts.

So, first problem, where to put the data. 

It's trivial to add custom properties to the DOM (although there are some reports of issues in MSIE prior to version 8) I've seen successful implementations running in version 6 and 7. To be strictly correct you should declare in these in a custom DTD (as Facebook do with FBML) and not doing so can result in the browser rendering in quirks mode. But a bit more reading revealed another great feature in HTML5: you can add custom attributes to nodes without a custom DTD as long as the name is prefixed with 'data-' and the remainder is all in lower case. This is backwardly compatible with older browsers and avoids having to write a custom DTD. Some browsers will still be forced into quirks mode - but since I've spent rather a lot of time structuring the CSS and HTML to be responsive this should not be a big problem. Yeah!

I was about to start writing the code to implement the scanner, when I stopped and thought about this. I was creating a javascript framework. In order for it to have any value, I'd have to re-write the plugins I wanted to use and ideally get other plugin writers to join my bandwagon. And there was also a risk that the core developers of Dokuwiki might introduce new features that broke when the scripts were at the bottom.

A quick email to the Dokuwiki developers list showed that the issue of scripts at the bottom had already been raised, considered and binned. If this was the road to go down, then I'd have to spend some time writing nice well behaved and integrated code - and that would also mean taking time to learn a lot more about Dokuwiki internals.

A further issue with this approach is that I'd still have to refactor any plugins I wanted to use to fit into this new framework - once again I was losing sight of the main objective here - plug and play!

Was there another way to solve the problem? Yes! although it's not backwardly compatible, HTML5's pushstate allows you to load content using ajax, and set the location in the browser without transitioning the entire page. There's a really nice implementation of this in the pages on Github - which degrades nicely on older browsers. And the source code / documentation for PJAX is available (from GitHub of course). 

Now, all the custom modifications could go into the template rather than the core code (and it's only about 4 lines of code). The only caveat is that I don't have a browser which supports it on any computers at home so I'll need to do some upgrading before I can try it out for real.

I will try to go back to looking at scripts-at-the-bottom in Dokuwiki, but for now I've got a working solution that performs well and provides a base for some nice eye-candy effects.

Monday, 6 May 2013

Starting a new website

I've been volunteered as the webmaster for a club my daughter is involved with. Currently they have  a rather plain static html website and a free Invision Power Board forum. Increasingly the members are using Facebook as a means of communicating. This is far from ideal - the discussions and postings add a lot of value to the club and they are being given away gratis on Facebook. Further given the imminent adoption of the Instagram act, I felt it's important to maintain additional controls over users copyright.

I'm  big fan of free software and data - but in this case, the members are expecting a bundle of additional products, services and discounts. While it's a non-profit organization with a very small turnover there are significant costs involved. Hence I felt it was a good idea to provide a more realistic alternative to Facebook on the Club site, including a members only area.

The first problem was the hosting. The site was using the starter package from HeartInternet. Although it was PHP enabled, the amount of storage was very small - allowing users to upload their own content would have filled the quota very quickly, and there was no  database support. Upgrading the account to support the expected usage would have been exorbitantly expensive. So the first task was to find a hosting company.

While I can find lots of very cheap hosting companies claiming to operate in the UK - nearly all of them have their datacentres in the US/Canada. It's not that I've got anything against the former colonies - but I already have a site hosted in Florida, and although packets go back and forth at around half the speed of light (I measured it - how sad) the latency is a killer. I don't know if there's a paucity of UK service providers or if they're just not as good at SEO as the ones across the Atlantic, but this proved rather hard. There's also a large number of dummy review sites out there. But it did confirm that the prices on offer from HeartInternet were rather expensive. www.hostpapa.co.uk claim to offer UK web hosting with unlimited bandwidth, unlimited storage along with the usual PHP/MySQL  combo for under £2 a month! It's only when you dig a little deeper you find out they have no UK Data Centres. I emailed them to ask if they could provide hosting in the UK - their reply: "for securit [sic] purpose we do not disclose the locations of our data centers".

Sorry hostpapa: FAIL

So, for the record, I did manage to track down jomongee, x9internet, 123reg (whom I already use for DNS) and Ronald MacDonald (I'm guessing no relation to the burger chain). Ronald is offering virtual servers at very low prices - although I'm a bit of a control freak, and am confident I could setup the server at least as well as most ISPs, I just don't have the time to do this - I'll just stick with a basic shared host. Notably none of these companies plaster their front pages with bogus awards and certificates. I'll probably go with x9internet - whom offer a reasonable hosting package, but what really swung it for me was the refreshingly honest way they appear to do business.

So that's the hosting side sorted out. I just need some software to run it all. Ideally the system would provide:

  • a CMS for publishing web pages - basic navigation, user authentication, public and member only pages
  • a forum where users could vote up/down posts and acquire points (like stack overflow)
  • WYSIWYG editor for forum
  • image uploads on forum posts
  • billing/payment for membership via Paypal or similar
Looking around the interent, I chose Pligg, Anahita, OpenOutReach and Oxwall to look at in more detail.

I decided to start with Oxwall because it was (IMHO) by far the prettiest. However....

1. Finding the requirements for the software (even what platform it runs on) from the website was very difficult. There's next to no technical documentation at all.

2. The requirements which are provided (in the zip file - no tarball) were a cause for concern
   - does not work with Suhosin
   - requires a cron job to be run *every* minute

3. The requirements were incomplete - the package relies heavily on mod_rewrite

4. The installation instructions don't work - assuming that the installer is suppoed to amend the URL in "Go to http://www.mysite.com/install to run the install script"  appropriately, I tried
http://localhost/mypath/install to get a 404 page

5. After running the installation script, the documentation goes on to tell the installer to run 'chmod 777' on all the directories created by the installation.a !!!!

6. Allowing the Oxwall .htaccess files to do anything they want:

    Options -All -Multiviews
    AllowOverride All
    Order deny,allow
    Deny from all

...Some progress - I got access forbidden (403) instead of 404 from the install URL.

7. The .htaccess file in the root directory contains:

Options +FollowSymLinks
RewriteEngine On

AddEncoding gzip .gz
AddEncoding gzip .gzip
  ForceType text/javascript
/FilesMatch>
  ForceType text/css

RewriteCond %{REQUEST_URI} !^/index\.php
RewriteCond %{REQUEST_URI} !/ow_updates/index\.php
RewriteCond %{REQUEST_URI} !/ow_updates/
RewriteCond %{REQUEST_URI} !/ow_cron/run\.php
RewriteCond %{REQUEST_URI} (/|\.php|\.html|\.htm|\.xml|\.feed|robots\.txt|\.raw|/[^.]*)$  [NC]
RewriteRule (.*) index.php


....i.e. it's not going to work unless it's installed in the root directory
There's also a lot of other things which are, at best strange here.

Creating a new vhost and moving the files into the DocumentRoot still resulted in a 403 error

8. After moving my current /var/www/html elsewhere and moving the Oxwall files back into the default webroot, I still got a 403 error at http://localhost/install

9. Pointing my browser at http://localhost/index.php, I finally got some output from Oxwall! It told me "Your hosting account doesn't meet the following requirements: zip PHP extension not installed"
- yes, another undocumented dependency.

10. installed php-zip, and got a configuration form (although the absence of any styling was a hint that the URL rewriting still wasn't working properly)

I know getting web paths sorted out is not easy - but I hate front controllers. But trying to create an arbitrary namespace using mod_rewrite is just asking for trouble. (BTW the Oxwall wiki runs on Dokuwiki - which I've written about before and is very cool).

While I could probably fix the problems and get a working site together (probably even fix the cron silliness) it's just not worth the effort - that the software packaging has been so sloppy, means that there are probably lots more gremlins in the code - I do not want the site pwned by the first script kiddy to come along.

It's a shame that someone has worked so hard to produce something which looks so nice and appears to have a lot of functionality in it, but makes so many basic errors.

Sorry Oxwall: FAIL 

Wednesday, 17 April 2013

Whither MSIE

Looking at my stats here on blogger.com, MSIE traffic has dropped to just 5% - behind Chrome, Firefox and Safari. Although it's fallen a long way from the market share MSIE had even 5 years ago, this is still a lot more skewed than I see on the real sites I manage.

I guess I need to start showing those MS users a bit more love (they do need it).

Wednesday, 27 February 2013

Compiling PHP

Since I've been playing around with computers for more years than I care to remember, I used to be very familiar with the process of unpacking tarballs and compiling code from source. But to be honest it's not something I do very often these days: Most Linux distributions come with an extensive array of pre-compiled binaries: the package management software keeps me up to date with security patches: my C skills are a bit rusty: And life's just too short!

But recently I've been looking at LAMP performance in some detail. I was surprised to find the PHP on my workhorse desktop (PCLinuxOS 2012) had been compiled with no optimization and the resulting binary stripped. I should note that at the time I installed it, there was neither a 64-bit nor an AMD specific port of the installation, hence the OS build was more about compatibility than performance.

So I had a play around to see if there was any benefits at compile time.

PHP is a scripting language and a lot of it's functionality is implemented in extension libraries. Dynamically linking these at runtime does have a performance overhead (although with fastCGI and mod_php, since the code forks rather than loads, this shouldn't be too great). For most people the ability to choose which extensions are loaded at runtime (and thus trim the memory footprint) outweighs the small processing overhead of runtime linking. Unfortunately my test methodology didn't allow me to measure the actual impact of static vs dynamic. In the absence of a site to test and complex tools for HTTP load testing - ab would not cut the mustard - I was using the CLI SAPI where there would be a big performance drop which would not happen on a properly configured webserver.

To compare the different optimization levels I compiled PHP 5.3.22 using gcc 4.5.2 with O0, O2 and O3 then timed 'make test'. My test machine was a dual core AMD athlon

property O0 O2 O3
CFLAGS CFLAGS="-march=native -pipe"
CXXFLAGS="${CFLAGS}"
CFLAGS="-march=native -O2 -pipe"
CXXFLAGS="${CFLAGS}"
CFLAGS="-march=native -O3 -pipe"
CXXFLAGS="${CFLAGS}"
Average (sys + usr) seconds 214.0 206.7 207.7
Std Dev (sys + usr) seconds 6.2 0.5 1.0
Max RSS (Kb) 569.8 569.8 570.0
Exe size Kb 6707.9 6880.8 7403.3
size of tripped exe 6257.5 6437.5 6973.6
I've not shown the results here, but I saw no measurable difference between the usr + sys times nor the max rss comparing the stripped and un-stripped binaries.
Interestingly the O3 optimization is very slightly, but measurably slower than O2. And O2 is around 5% faster than O0
The gain of 5% seems a little disappointing compared to metrics reported for other programs but I would expect to see to see greater gains for PHP code implementing (for example) encryption and image processing.

Tuesday, 20 November 2012

How not to manage an IT dept

I think enough water has flowed under the bridge for me to talk about my experiences as IT manager in a small start up company where I worked until 6 years ago. The characters and events described herein are entirely genuine; only the facts have been changed to protect the innocent.

Having previously worked as Computer Services Manager for a UK retailer, the job sounded like everything I could have wished for - head of IT for a small company producing and selling digital media (that's ringtones for the civilians out there). Hands-on IT work, developing software. Hence I  was happy to accept the job at a significant salary cut from my previous job where I was increasingly just managing contracts.

I joined the company near the end of it's first year of trading when it had a turnover of just £80,000 but promising things were afoot. In the following year, it turned over £2M, then £4M, and was on target for £6M when I left 30 months later. But within a year of my departure it had stopped trading and subsequently wet into liquidation. I'd like to think that I had something to do with its success while I was there. But for now I'll start at the beginning.

In terms of infrastructure they were unusual in doing their own hosting. The implementation of the infrastructure was remarkably well thought out. The code which ran their websites and PRS IVR systems however left a lot to be desired. The code had been delivered an East-European individual who operated through a chain of shell companies. He was then sub-contracting the work to the cheapest bidder. Hence the code was all in different styles and not very sophisticated. However, to be fair, since each page (PHP script) had no dependencies on any other bit of code at least any issues were well isolated, and the code was commented in English.

I was expected to take on the management of the infrastructure and some of the development work, with the plan that I would build a small team of in-house developers. Looking after the servers was easy enough - I made relatively few changes to the configuration. But based on my previous experience, I did spend a lot more time defining and documenting processes than my employers were really comfortable with, although anyone with a background in CMMI / ITIL would probably have said that I didn't go nearly far enough. Along side this I was developing new functionality for the systems and trying to recruit more IT staff.

As this was after the internet bubble had burst, people describing themselves as web developers were ten-a-penny - and that's about all I was able to offer them in a salary. But I really needed competent software engineers / analyst programmers - that the sites were mostly developed in PHP was merely an implementations detail. Hence my first big problem was how to recruit and retain people with real programming skills at half the salary which was available elsewhere.

The story will continue in a future post....