When Bad Software Kills

Originally published at: http://www.sitepoint.com/therac-25-bad-software-kills/

Snefru's Bent Pyramid in Dahshur" by Ivrienen at en.wikipedia. Licensed under CC BY 3.0 via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:Snefru’s_Bent_Pyramid_in_Dahshur.jpg

This is the ‘Bent Pyramid‘ – a 4600 year old monument to engineering failure.

From the base, the sides set off at an alarmingly steep 54-degree incline, before abruptly switching to a gentler 43 degree slope about halfway up.

It’s believed that the design was altered during construction following the catastrophic collapse of the Meidum Pyramid — another steep-sided pyramid — about 60 kilometres to the south.

Of course, it’s hard to blame the ancient pyramid builders. They were effectively inventing engineering as much as they were learning it.

One thing hasn’t changed since that time: when structural engineers mess up, people get hurt. We can’t know for sure, but it seems unlikely that the Meidum collapse could take place without a human cost.

By comparison, ‘software engineer’ can seem like a fluffier flavor of the engineering sciences. A mistake might prevent a user from accessing their account or entering information, but it surely isn’t life threatening?

No-one gets hurt, right?

Or that’s what we think.

The truth is, every year our systems — from power to traffic to agriculture to emergency services — become more dependant on us all creating high quality software to support them.

And when we fail — like those ancient Egyptians — people can actually get hurt.

Surprisingly, as the sad case of the Therac-25 shows us, this isn’t even a 21st century problem.

Software Can Kill

By the late 1970’s, Atomic Energy of Canada Limited (AECL) had earned a good reputation for building radiation therapy machines.

These machines used targeted electron beams to attack tumours in patients. Make no mistake, these beams are high-intensity and potentially lethal.

AECL had previously enjoyed great success with their Therac-6 and Therac-20 models. These units needed to be manually controlled by a trained operator, and used mechanical switches and hard-wired circuits to ensure high levels of safety.

The Therac-25 was to be their ‘dream-machine’.

The Therac-25 machine

Smaller and cheaper, yet more efficient than its predecessors, the new machine incorporated two different beams technologies — an x-ray and a high-energy electron. The different beams allowed operators to target tumours at different depths without damaging nearby healthy tissue.

The Therac-25 was both ambitious and sophisticated — and for the first time all this hardware was controlled by a software layer.

Unfortunately, though AECL’s intentions were good, their software design was tragically bad, incorporating a series of horrendous design flaws.

Later investigations carefully documented these flaws and they’re still chilling to read today.

In one instance, during a treatment one machine continuously shut itself down reporting a cryptic ‘H-tilt‘ and ‘no dose‘ error message each time. The operator attempted to deliver the treatment six times before giving up.

It was only later that it was determined that the machine had delivered the full dose every time.

From its launch in 1982 till its withdrawal in 1986, six patients received ultimately fatal injuries from Therac-25 treatments. It’s horrendous to consider that these people were already sick.

Today AECL exists not as a company, but as a tragic textbook example to all of us of how poorly-designed and untested software can impact lives. The Therac-25 disaster still informs a lot of the ideas we have on systems design and safety testing today.

Even if you’re a front-end designer, and don’t consider yourself a ‘serious engineer’, Therac-25 teaches us lessons. While some flaws were caused by poorly coded processes, at least as much damage was caused by inadequate documentation, thoughtless messaging and arcane errors messages. These are areas that everyone — designers, coders, managers, UX people and testers — should have contact with.

Looking back at those ancient egyptians, it’s clear that they learned from their early mistakes and went on to build some of the most breathtaking structures that have ever existed.

Software engineering is still a comparatively young field — let’s hope we’ve already built our Bent Pyramids.
Continue reading this article on SitePoint

1 Like

Nice article, Alex. I read this in your email and hoped you’d post it here!

Disastrous laser software aside, it’s worth mentioning the effects of poor programming that leads to inaccessible sites, too. An inaccessible site might not kill someone physically, but it can kill people psychologically, so to speak. (It’s painful to watch videos of people using various assistive technologies having to deal with inaccessible sites.)

There are lots of sites out there that are effectively pyramids with no entrances for some people. Either that, or there is a long stairway up to the door. An accessibility advocate (who sadly died recently) famously quipped—

I’ve been an atheist for a long time—ever since I first heard that there was only a stairway to heaven.

2 Likes

This reminds me of https://medium.com/@designuxui/how-bad-ux-killed-jenny-ef915419879e

1 Like

Thanks @ralphm.

That is a fantastic quote. Pretty sure I’ll work that into a newsletter at some point. Love it.

Thanks for linking that, @Anorgan.

I hadn’t seen it before now, but you’re right – super relevant in this context.

Great! I should have cited it. It’s by Stella Young.

1 Like

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.