Friday 24 October 2014

Nouvea? Retch.

I often wonder what heinous crimes I committed in a previous existence to deserve the punishments I get in this one.

It all started so simply.

My Desktop was running PCLinuxos11 - a bit long in the tooth and still 32 bit, but it was all working and I kept up with the patches. But then at the last round of patches, the Chromium browser stopped working - a broken dependency. I try to fix the dependancy - but get a 404 from the respository. I trid to revert but can no longer find the previous package. Oh well, I bite the bullet and try to do a dist upgrade - which completely trashes my machine.

First I try installing PCLinuxOS14 - but it uninstalls all of KDE (but I still have openbox which I added some time ago to play around with). Then I try OpenSuse (I used to run Suse on my servers up to about version 8) the current version looks nice and it all works but OMG is it SLOW! And it also trashes the PCLInuxOS installation completely! Then I try MINT 17 - which won't even boot. Then I find an old MINT 15 DVD which boots OK and I install that, mount my /home filesystem and recreate the accounts. I roll forward the patches, and I seem to have a working (and usable system). Only I can't install any more software as it seems this version of the distro is no longer supported.

Why is this stuff so hard? I know you guys don't want to maintain lots of different versions of your software, but is it so hard to just leave the old packages online and let us upgrade through them? 

After a lot more digging it seems that my graphics card (Nvidia GeForce 6150) does not play nice with recent versions of the nouveau driver. Hence my only option is to hope that the nVidia supplied versions will work with whatever flavour Linux I try next - which I need to boot up and install with the nouveau driver blacklisted. But for a short while I think I'll stick with having a working computer.


Nothing Nowhere - Thanks Orange

I've had the same mobile SIM (and number) for ...erm...at least 12 years from Orange - a basic PAYG account. In recent years my work has provided me with a contract phone, but I keep the Orange SIM in an charged phone and top up the credit every so often - formerly using scratch cards, more recently via ATMs.

This weekend I'm stuck in the house and needed to add some credit - surely I must be able to do that online? So I find the site, and yes, I can top up online but first I have to create an account - ho hum - a bit of a hassle, but I'll play along. I type in my details, they send a text to the phone with an activation code. I don't know why I can't make an anonymous online top up via the internet when I can do this at any ATM. But surely just a few clicks and keystrokes and I'll get there.

No.

First the registration process process fails to redirect to the account. Then When I do eventually get logged in, I can't do anything unless I provide the 4 digit PIN code they say they sent me when I registered the phone. WTF?

So I click on the "I don't have my PIN" link - it says to call 450. I call 450. I then spend around 10 minutes pressing buttons trying to find my way through their IVR, then just when I think I'm getting near to my destination I get disconnected. I try again. 10 minutes later disconnected again.

FFS, Orange I'm trying to give you money!

So how do I contact them via the website? Is there an email address? No. Is there a form I can submit a request to? No. The only option appears to be to call a mobile number (which I can't do without any credit on my phone).

Tuesday 22 July 2014

Browser Fingerprinting - digging further into the client

I previously wrote about some techniques for Browser Fingerprinting (or "Device Identification" as it's known in some circles). Today I came across an interesting technique already in widespread use which detects variations between devices by looking at how content is rendered by WebGL / HTML5 Canvas.

Typically as much of the processing for these as possible is pushed onto the underlying hardware, resulting in consistent results for a given device independently of the OS / software. There is a surprising amount of variation here. However there's not sufficient variation for it to be used in isolation from other methods.

Update: 28 Jan 2015

Another interesting article found here lists Google Gears detection and MSIE security Policy as enumerable interfaces for fingerprinting. (The TCP/IP parameters is presumably done serverside, while proxy detection uses Flash). But the really interesting bit is that 2 of the products tested tried to hookup with spyware on the client!

Thursday 12 June 2014

Whatever happened to scripting?

Don't you just love "enterprise" tools. Most of the ones I've had the pleasure of working with seem to have been around a very long time, belonged to companies which have progressively been bought over by bigger and bigger corporations, have been developed by different teams with different methodologies and coding styles. It's a miracle they work at all.

But one common theme, and one that they all tend to be very good at is making sure that they are at the top of the food chain in their field. Most provide good downstream connectivity, collecting data from all sorts of different sources. But it it is exceedingly difficult to integrate them to upstream components - for reporting, user management, logging etc.

The latest problem was to get data out of Microstrategy. It has a SOAP interface for invoking reports remotely. But try finding any documentation for it. It also has a "simple" HTTP based interface (where report definitions are specified in the URL). Again with no available documentation. I asked the Data warehouse team whether they knew anything about these interfaces. Answers ranged from "What's SOAP?" to "no". It has a scheduler for running reports - can't we just dump these on a filesystem somewhere?....apparently not.  So how can you get information out in a machine readable form? We can send an email.

Great. Email I can do. So I fire up putty and start hacking together a script to get the file out of an email and hand it over to my app. Fetchmail -> procmail -> metamail. Simples.

....only metamail is not available in RHEL. I've previously blogged about mail processing in RH. I really don't want to write my own MIME handler. While there's lots of PHP implementations on the internet, you need to look hard to find the ones which are robust and well written. But even then the parsing is done by loading the entire message into memory. Not very handy if the message is 100Mb+ and using PHP.

I could download metamail and compile it....but looking around the internet, it doesn't seem to have been actively maintained. Indeed there hadn't been any significant changes since I'd sent in some bug reports about 15 years ago! Investigating further I found ripmime which does what I need. So a quick security scan and it seems ideal.

This might be a good point to describe what I looked for in checking its provenance.


  • It seems to be bundled in several Linux distros (i.e. other people like it and are using it).
  • Older versions have some CVEs logged against it - now fixed. This is good on several counts - again it shows that people are using it and finding security problems and the security problems are getting fixed.
  • the other products flagged in the same CVEs put it in respectable company
  • I went to the origin website for the tarball - found other interesting security stuff related to email handling.
  • scanned the source for anything that might indicate an alternate function (fork(), exec*(), system(), socket stuff)
  • looked to see if the code was aware of obvious attacks (such as Content-Disposition: attachment; filename="/etc/passwd";).
All good. It would have taken me a very long time to implement all this myself.

Really RedHat, ripMIME should be part of RHEL!

I know Linux is now mainstream - but that doesn't mean I want a complex black box which I can't diagnose or re-purpose. If I wanted that I would have bought MS Windows! (are you reading this KDE PIM developers, systemd developers). 

Fortunately it's trivial to build ripmime (no dependencies other than glibc and iconv).

Project back on track. Thank you Paul.



Monday 24 March 2014

Warning: BBWC may be bad for your health


Over the past few days I've been looking again at I/O performance and reliability. One technology keeps cropping up: Battery Backed Write Caches. Since we are approaching World backup day I thought I'd publish what I've learnt so far. The most alarming thing in my investigation is the number of people who automatically assume that a BBWC assures data integrity. Unfortunately the internet is full of opinions and received wisdom, and short on facts and measurement. However if we accept that re-ordering of the operations which make up a filesystem transaction on a journalling filesystem undermines the integrity of the data, then the advice from sources who should know better is dangerous.


Write barriers are also unnecessary whenever the system uses hardware RAID controllers with battery-backed write cache. If the system is equipped with such controllers and if its component drives have write caches disabled, the controller will advertise itself as a write-through cache; this will inform the kernel that the write cache data will survive a power loss.”

There's quite a lot here to be confused about. The point about RAID controllers is something of a red herring. There's a lot discussion elsewhere about software vs hardware RAID – and the short version is that modern computers have plenty of CPU to handle the RAID without a measurable performance impact, indeed many of the cheaper devices offload the processing work to the main CPU. On the other hand hardware RAID poses 2 problems:

  1. The RAID controller must write its description of the disk layout to the storage – it does this in a propretary manner meaning that if (when?) the controller fails, you will need to source compatible hardware (most likely the same hardware) to access the data locked away in your disks
  2. While all hardware RAID controllers will present the configured RAID sets to the computer as simple disks, your OS need visibility of what's happenning beyond the controller in order to tell you about failing drives. Only a small proportion of the cards currently available are fully supported in Linux.

It should however be possible to exploit the advantages (if any) offerred by the non-volatile write cache without using the hardware RAID functionality. It's an important point that the on-disk caches must be disabled for any assurance of data intregrity. But there an omission in the statement above which will eat your data.

If you use any I/O scheduler other than 'noop' then the writes sent to the BBWC will be re-ordered. That's the point of I/O scheduler. Barriers (and more recently FUA) provide a mechanism for write operations to be grouped into logical transactions within which re-ordering has no impact on integrity. Without such boundaries, there is no guarantee that the 'commit' operation will only occur after the data and meta data changes are applied to the non-volatile storage.

Even worse than losing data is that your computer wont be able to tell you that it's all gone horribly wrong. Journalling filesystem are a mechanism for identifying and resolving corruption events arising due to incomplete writes. If the transaction mechanism is compromised from out-of-sequence writes, then the filesystem will most likely be oblivious to the corruption and report no errors on resumption.

For such an event to lead corruption, it must occur when a write operation is taking place - since writing to the non-volatile storage should be much faster than to disk, and that in most systems reads are more comon than writes, writes will only be occurring for a small proportion of the time. But with faster/more disks and write intensive applications this differential decreases.

When Boyd Stephen Smith Jr said on the Debian list that a BBWC does not provide the same protection as barriers he got well flamed. He did provide links that show that the barrier overhead is not really that big.

A problem with BBWC is that the batteries wear out. The better devices will switch into learning mode to measure the battery health either automatically or on demand. But when they do so, the cache ceases to operate as non-volatile storage and the device changes it's behaviour from write-back to write through. This has a huge impact on both performance and reliability. Low end devices won't know what state the battery is in until the power is cut. Hence it is essential to choose a device which is fully supported under Linux.

Increasingly BBWC is being phased out in favour of write caches using flash for non-volatile storage. Unlike the battery-backed RAM there is no learning cycle. But Flash wears out with writes. The failure modes for these devices are not well understood.

There's a further consideration for the behaviour of the system when its not failing: The larger the cache on the disk controller or on disk, the more likely that writes will be re-ordered later anyway – so maintaining a buffer in the host systems memory and re-ordering the data there just means adding more latency before data is in non-volatile storage. NOOP will be no slower and should be faster most of the time.

If we accept that a BBWC with a noop scheduler should ensure the integrity of our data, then is there any benefit from enabling barriers? According to RedHat, we should disable them because the BBWC makes them redundant and...
enabling write barriers causes a significant performance penalty.”
Wait a minute. The barrier should force the flush of all data held in non-volatile memory. But we don't have any non-volatile memory after the OS/filesystem buffer. So the barrier should not be causing any significant delay. Did Redhat get it wrong? Or are the BBWC designers getting it wrong and flushing the BBWC at barriers?

Could the delay be due to moving data from the VFS to the I/O queue? We should have configured our system to minimize the size of the write buffer (by default 2.6.32 only starts actively pushing out dirty pages when the get to 10% of the RAM – that means you could have 3Gb of data in volatile storage on a 32Gb box). However, many people also report performance issues with Innodb + BBWC + barriers, and the Innodb engine should be configured to use O_DIRECT, hence we can exclude a significant contribution to performance problems from the VFS cache.

I can understand that the people developing BBWC might want to provide a mechanism for flushing the write back cache – if the battery is no longer functional or has switched to “learning mode” then the device needs to switch to write through mode. But its worth noting that in this state of operation, the cache is not operating a non-volatile store!

Looking around the internet, it's not just Redhat who think that a BBWC should be used with no barriers and any IO scheduler:

The XFS FAQ states
“it is recommended to turn off the barrier support and mount the filesystem with "nobarrier",”
Percona say you should disable barriers and use a BBWC but don't mention the I/O scheduler in this context. The presentation does later include an incorrect description of the NOOP scheduler.

Does a BBWC add any value when your system is running off a UPS? Certainly I would consider a smart UPS to be the first line of defence against power problems. In addition to providing protection against over-voltages it should be configured to implement a managed shutdown of your system, meaning that transactions at all levels of abstraction will be handled cleanly and under the control of the software which creates them.

Yes, a BBWC does improve performance and reliability (in combination with the noop scheduler, a carefully managed policy for testing and monitoring battery health and RAID implemented in software). It is certainly cheaper than moving a DBMS or fileserver to a two-node cluster, but the latter provides a lot more reliability (some rough calculations suggest about 40 times more reliable). If time and money are no object then for the best performance, equip both nodes in the cluster with BBWC. But make sure they are all using the noop scheduler.

Further, I would recommend testing the hardware you've got – if you see negligible performance impact with barriers and a realistic workload https://github.com/axboe/fio then enable the barriers.


Monday 6 January 2014

Transactional websites and navigation


There's lot's of things that make me happy. In my professional life, it's getting stuff done, helping other people or learning something new. Recently I learnt something which was probably widely known but I'd managed to miss all these years. I'm so inspired that I'm going to share it with you all.

A lot of transactional websites crumble when you dare to do something as reckless as use the back button or open more than one window on the site. The underlying reason is that the developer is storing data relating to the transaction – i.e. specific to a navigation event in a single window - in the session – which is common to all the windows. A very poor way to mitigate the problem is to break the browser functionality by disabling the back button or interaction via a second window. I must admit to having used this in the past to mitigate the effects of problems elsewhere in the code (alright, if you must know – I bastardized Brooke Bryan's back button detector, and as for the new window....well the history length is zero)

But how should I be solving the problem?

The obvious solution is to embed all the half-baked ingredients of a transaction in each html page sent to the browser and send the updated data model back to the server on navigation. This can work surprisingly well as long as the data on the browser is cleared down between sessions. But with increasingly complex datasets, this becomes rather innefficient, particularly on slow connections. Further there are times when we want the transaction state to reflect the session state: consider a shopping basket – if a user fills a shoppng basket then ends their session we might want to retain the data about what they put in their shopping basket – but we might also want to release any stock reserved by the act of adding it to the basket. Often the situation arises where we end up with (what should be) the same data held in more than one place (browser and server). At some point the representations of the truth will diverge – and at that point it all goes rather pear shaped.

A while back I created a wee bit of code for point and click form building – PfP Studio. Key to the utility of this was the ability to treat a collection of widgets (a form) as a widget itself. And the easiest way to achieve that was to support multiple windows. When I first wrote this, I decided that the best way to handle the problem was to add partitioned areas to the session – one for each window. This depended on the goodwill of the user to open a new window via the functionality in the app rather than the browser chrome: each window had to carry a identifier (the “workspace”) across navigation events. Effectively I was rolling my own session handling with transids


This has a number of issues – the PHP sites warns about leaking authentication tokens but there's also a lot of overhead when you start having to deal with javascript triggerred navigation, and PRGs.

Then the other day I discovered that the browser window.name property was writeable in all major browsers! Hallelujah! Yes, it still means that you need to do some work to populate links and forms, but it's a lot simpler than my previous efforts – particularly with a javascript heavy site.

Any new window (unless it has been explicitly given a name in an href link or via window.open) has an empty string as the name – hence you simply set a random value if it's empty – and the value persists even if you press the back button.

While I think I've explained what I'm trying to say, a real example never goes amiss:

if (''==window.name) {
   var t=new Date();
   window.name=t.getMilliseconds()+Math.random();
}