When Bad Software Kills

Alex Walker
"Snefru's Bent Pyramid in Dahshur" by Ivrienen at en.wikipedia. Licensed under CC BY 3.0 via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:Snefru%27s_Bent_Pyramid_in_Dahshur.jpg#mediaviewer/File:Snefru%27s_Bent_Pyramid_in_Dahshur.jpg

“Snefru’s Bent Pyramid in Dahshur” by Ivrienen at en.wikipedia.

This is the ‘Bent Pyramid‘ – a 4600 year old monument to engineering failure.

From the base, the sides set off at an alarmingly steep 54-degree incline, before abruptly switching to a gentler 43 degree slope about halfway up.

It’s believed that the design was altered during construction following the catastrophic collapse of the Meidum Pyramid — another steep-sided pyramid — about 60 kilometres to the south.

Of course, it’s hard to blame the ancient pyramid builders. They were effectively inventing engineering as much as they were learning it.

One thing hasn’t changed since that time: when structural engineers mess up, people get hurt. We can’t know for sure, but it seems unlikely that the Meidum collapse could take place without a human cost.

By comparison, ‘software engineer’ can seem like a fluffier flavor of the engineering sciences. A mistake might prevent a user from accessing their account or entering information, but it surely isn’t life threatening?

No-one gets hurt, right?

Or that’s what we think.

The truth is, every year our systems — from power to traffic to agriculture to emergency services — become more dependent on us all creating high quality software to support them.

And when we fail — like those ancient Egyptians — people can actually get hurt.

Surprisingly, as the sad case of the Therac-25 shows us, this isn’t even a 21st century problem.

Software Can Kill

By the late 1970’s, Atomic Energy of Canada Limited (AECL) had earned a good reputation for building radiation therapy machines.

These machines used targeted electron beams to attack tumours in patients. Make no mistake, these beams are high-intensity and potentially lethal.

AECL had previously enjoyed great success with their Therac-6 and Therac-20 models. These units needed to be manually controlled by a trained operator, and used mechanical switches and hard-wired circuits to ensure high levels of safety.

The Therac-25 was to be their ‘dream-machine’.

The Therac-25 machine

Smaller and cheaper, yet more efficient than its predecessors, the new machine incorporated two different beams technologies — an x-ray and a high-energy electron. The different beams allowed operators to target tumours at different depths without damaging nearby healthy tissue.

The Therac-25 was both ambitious and sophisticated — and for the first time all this hardware was controlled by a software layer.

Unfortunately, though AECL’s intentions were good, their software design was tragically bad, incorporating a series of horrendous design flaws.

Later investigations carefully documented these flaws and they still make chilling reading today.

In one instance, during a treatment one machine continuously shut itself down reporting a cryptic ‘H-tilt‘ and ‘no dose‘ error message each time. The baffled operator attempted to deliver the treatment six times before giving up.

It was only later that it was determined that the machine had indeed delivered the full dose every time — a catastrophic overdose.

From its launch in 1982 till its withdrawal in 1986, six patients received ultimately fatal injuries from Therac-25 treatments. It’s particularly horrendous when you consider that these poor people were already sick.

Today AECL exists not as a company, but as a tragic textbook example to us all of how poorly-designed and untested software can impact lives. To this day, the Therac-25 tragedy still informs a lot of the ideas we have on systems design and safety testing.

Ancient pharaoh statue

photo: kmf164

Even if you’re a front-end designer, and don’t consider yourself a ‘serious engineer’, Therac-25 has important lessons. While some flaws were caused by poorly coded processes, at least as much damage was caused by inadequate documentation, useless feedback and incomprehensible errors messages. These are areas that everyone — designers, coders, managers, UX people and testers — should have influence over.

Looking back at those ancient egyptians, it’s clear that they learned from their early mistakes and went on to build some of the most breathtaking structures that have ever existed.

Software engineering is still a comparatively young field — let’s hope we’ve already built our Bent Pyramids.

Originally published in the January 29th issue of the SitePoint Design Newsletter. Subscribe here.