Why Exceptions Suck

Introduction

In previous posts I've focused on the importance of removing cyclometric complexity from your software and how that can dramatically reduce the defect rate.

This post is related to this theme, but rather than talking solely about the cyclometric complexity I want to tackle a particularly egregious type of complexity that exists in most modern programming languages: structured exception handling.

What is structured exception handling?

Python, C# and Java all have structured exception handling. In C# and Java you find code that looks like this:

    try
    {
        // Some IO operation.
    }
    catch(IOException e)
    {
        // Do something
    }
	

The idea is that if something exceptional occurs within the body of the try block, we can catch one or more exceptions and handle them. If there is no try/catch block in the routine you're currently in, the exception will "bubble" up the call stack, with the hope that it will be handled by some catch block. If it isn't, the program crashes.

The idea behind this is that we can clearly delineate error handling code from normal unexceptional code.

Why do they suck? - The Theory

Exceptions considered harmful

In 1968 Edagar W. Dijkstra had a letter published called "Go-to statement considered harmful." In this letter he gave a detailed account of why he thought that the GOTO statement in high-level languages should be abolished.

The central premise of his letter was that when designing a program, it should be possible to uniquely define a "meaningful" position in a program by looking at a few basic co-ordinates. Such as the current line number, the current call stack, the value of any loop counters.

The GOTO statement allows code that completely destroys any meaningful co-ordinate system. For example, it is legal to jump to code within a loop construct. In this case, how do you know whether the entry point was the natural start of the loop or whether you jumped in to it?

You could add another co-ordinate for each GOTO statement in your program, but then your program's control flow becomes dependent on a vast array of GOTO branch conditions.

Exceptions are a super GOTO. Rather than allowing you to jump anywhere in the position of the procedure, they allow you to teleport up the call stack to any number of handlers.

At first, it seems like we have adhered to the advice of Dijkstra because the exception code will only be executed when an exception has been thrown. There is no other way to have that code be executed. We could add the "CurrentException" item to the laundry list of co-ordinates and be on our way.

I don't think this is a valid defence of structured exception handling. While we have probably adhered to the recommendation of Dijkstra technically, we have certainly violated the spirit of it. The whole reason he wanted a meaningful co-ordinate system for tracing program execution was so that we can reason about the code. I would say that exceptions are much more difficult to reason about than return codes.

Exceptions make reasoning about a program hard

Let's say I throw an exception. How do I know as a programmer where that error is going to get handled? In most languages, the answer is "you don't know." The exception might not be handled at all, which will result in a program crash. If this is not the case, the exception might be handled in any of the routines up the call stack.

This is where the real pain begins, since the exception may not be handled in the routines that directly call your routine. They may be handled much higher up in routines that are functionally unrelated to the purpose of your code. So it is not enough to search all the code for uses of your routine, you have to search for every routine that might, on one code path, result in the execution of your routine! How on earth do you reason about such a construct? It's also worth bearing in mind that different code paths may have vastly different error handling logic, so you'd have to develop correctness proofs for each handler.

Checked exceptions the answer?

It's worth pointing out that the writers of Java knew about this and developed something called checked exceptions. Checked exceptions are exceptions that you are forced to provide a handler for. This allows you to trivially show that some types of exception are always handled, however, this does not tell you where they will be handled and whether the handling procedure is correct for that type of problem. Checked exceptions are certainly better than unchecked exceptions, since you know they must be handled but even so they're just as hard to reason about as their unchecked counter-parts.

But even checked exceptions are evil..

The problem with checked exceptions is not such much the theory but the practice of how they get used.

When you force a lazy developer to do something, what happens is they reach for the quickest way to make the compiler happy. In this case, that means indiscriminately catching all exceptions!

Now you have code that'll never fail but then it'll never work properly either.

Exception throwing code is invisible

How do I know whether a routine is going to throw an exception? Resolving this question requires inspecting my routine for any thrown exceptions, then inspecting the routines my routine calls for any exception and so on recursively until we have a definitive list. This is true even for Java since handlers are not forced for unchecked exceptions.

This is a huge amount of work to perform when writing a routine because even if the library is well documented, you usually don't have complete access to the source code. This means that you have no idea what pre-conditions you need to fulfil to avoid the exception. How on earth can you reason about your code when you don't have access to that information?

Throwing exceptions breaks encapsulation

To some degree error reporting always break encapsulation. It's important that a routine is able to signal that it has failed and doing so will obviously communicate something about how the routines work to the outside world. The problem is that exceptions are promiscuous in who they will communicate this information to. They will communicate it anybody on the call stack and they will communicate it to modules totally unrelated to the routine that threw the exception in the first place.

Consider an example where you've decided to use flat files as a store for database tables. For some reason, one of the table files is missing so a SQL query throws a "FileNotFoundException", this exception communicates the fact that files are used as the underlying data store to parts of the system that have no business knowing how the database works.

Most people designing a library would say that throwing a FileNotFoundException is not what you want to do. Instead, you want to throw a TableNotFoundException, with FileNotFoundException as the inner exception. Okay, fair enough. But what if some tables are accessed over a network connection and others are stored locally and I want to be able to distinguish between these two types of failure?

There is no way to distinguish between these errors without leaking information about the implementation.

Being able to distinguish between them is important. If the table doesn't exist on disk, then it probably never will, so the correct action is to gracefully exit and report an error. However, you might decide that if the network is unavailable then it is may be a temporary connection issue so it is worth retrying whatever we were doing. These are two different failure modes.

This might seem a bit of an contrived example but I see this pattern fairly regularly myself, particularly in code that manages a long running transaction.

It's true that if I was using return codes, I'd have to break encapsulation as well however only callers to that routine would be able to notice the broken encapsulation. With structured exception handling, all the routines on the call stack can potentially see that broken encapsulation. With structured exception handling, the break leaks much further.

Why do they suck? - In Practice

The theory of why exceptions suck is just the start of the problems with exceptions. What happens in real world programs is significantly more nasty. The problem with exceptions is they encourage developers to do some really bad things. In this section I'm going to look at some of the patterns I've seen.

Paranoid coding

Most developers are under serious time pressure to deliver a working product. Given the difficultly of establishing which routines will throw what exceptions, they often don't bother finding out at all.

This leads to bizarre "just in case" code and this leads to programs that are difficult to debug.

For example:

public static decimal ComputeSmokerPercentage(PersonCollection personCollection)
{

	try
	{
		int numberOfPeople = personCollection.Count;
		int numberOfSmokers = personCollection.NumOfSmokers;
		return (Convert.ToDecimal(numberOfSmokers)/Convert.ToDecimal(numberOfPeople)) * 100M;
	}
	catch(Exception)
	{

		// Exception might be thrown, caught so that it doesn't leak out.
		return 0M;
	}

}
	

Because they have absolutely no idea what exceptions might be thrown, they just wrap every non-trivial piece of code in a try/catch and try and attempt to pick a good default for when the routine fails. It's a sign the developer doesn't really know what's going on in their program; they're asking the computer to just blindly handle any errors in their logic. That's never a good policy because if you don't understand why each failure occurs, you can't be certain that your default case is really the right thing to do in all of the cases.

Consider this bug, which is paraphrased from a bug I actually found at work:

public static Collection<Widget> RetrieveList(int companyId)
{
	try
	{
	
		return Broker.RetrieveList<Widget>(new Key(COMPANY_ID, companyId));
	
	}
	catch(Exception)
	{
		return new Collection<Widget>();
        }
}
	

In this case, the choice of default obscured a problem with the configuration-store for the persistence framework. Since the exception wasn't logged anywhere or wasn't even checked to see what type of error it was, it frustrated the debugging process for longer than it should have.

If the developer had thought about what exceptions could be thrown, he might have made the configuration exception result in a program exit with a clear message. In some cases, you might even have a body of code that never produces an exception but is in a try/catch block none-the-less. This just makes it confusing and hard to read the code.

People use this pattern because they believe that catching the exception means their program will "never fail." I've said it once and I'll say it again: Yes your program probably won't crash, but it will probably never work correctly either.

Failure to avoid exceptions

An exception avoided is worth fifty caught. Handling exceptions is expensive in any language. As soon as you throw an exception, the program has to unwind the stack in an attempt to find a handler. Moreover, code in a try/catch block can't be optimised by just-in-time compilers very easily.

One of the key failure modes of structured exception handling is the fact the people do not attempt to avoid exceptions. In fact, they'll march in to them without barely a second thought.

Consider the following code snippet:

public int CalculateDamage(Player player, int hitSize)
{
	int returnedDamage = hitSize;

	try
	{
		if (player.PlayerClass == PlayerClass.Orc)
		{
			returnedDamage = hitSize * 0.5;
		}
	}
	catch(Exception)
	{
	}

       	return returnedDamage;

}
	

The exception the programmer is guarding against is the null reference exception. It would occur when player is referenced in the comparison against PlayerClass enum.

The first problem with this code is that it is hard to work out what it does. For example, consider this question: under what circumstances will the catch block be hit? You have to do mental gymnastics to get at the underlying reason for the catch block's existence.

Isn't this a lot simpler?

public int CalculateDamage(Player player, int hitSize)
{
       	int returnedDamage = hitSize;
       
	if (player != null && player.Class == Class.Orc)
	{
    		returnedDamage = hitSize * 0.5;
	}

	return returnedDamage;
}
	

Not only is the routine shorter, but it's also easier to understand.

I still don't like the routine. It's clear that if a null player is passed to this routine then there is clearly a programming defect that the user needs to know about. There is no sense in hiding the bad behaviour behind a default.

Yet, in the vast majority of cases where I see this sort of SEH, this is exactly what the programmer does.

This leads to buggy programs where the root cause of a crash is difficult to determine. The program will wander on for some time, strolling from one exception handler to the next, until eventually an exception leaks out and crashes the entire program. Quite often, the place where the program finally dies is totally unrelated to the root cause of the problem.

And SEH was meant to make writing reliable programs easier?

Can't this be fixed by hiring better developers?

Many notable people in the software industry think that all our problems can be solved by hiring practices. I think this is a fallacy.

For a start, unless you're Google, attracting talent is hard.

Even Joel Spolsky of Joel on Software fame has trouble attracting talent. He's arguably one of the best communicators in our industry. Joel's opinions on hiring are well known. He's written entire books about the importance of hiring the right people.

However, even Joel is starting to learn the hard way that the best and brightest minds do not want to work on a bug-tracker. Joel isn't writing cool software. Just like 95% of the products in the world his product is pretty boring.

I pick on Joel not because I have a grudge to bear with him; quite the opposite in fact since I enjoy his writing and his podcasts. I pick on Joel because he is loudist advocate of the "best developers solve all philosophy." For all his talent both in programming and in communication he can't change the force of gravity. Most of he products in the world are not sexy and it's going to be hard for him to attract talent.

Given that all the really outstanding developers are at places like Google and Microsoft, a small company like Fogcreek or the one I work for can't really can't fix the problem by hiring the best. Rather than saying "The best developer can use Structured Exception Handling properly, so we'll hire them.." What are we going to say about the 75% of average developers who mis-use the technology on a daily basis? What tools are we going to give the much maligned mediocre developer to make sure they don't screw up their programs?

To me, that is an absolutely essential question. This field doesn't advance much by making the super-star hotter but by improving the median developer. Super-stars develop 1% of the world's software. The median developers develop the rest.

So yes, structured exception handling can be used effectively by the best but that is entirely beside the point. The real question is whether a given language feature is ergonomic enough to be used by the rest without compromising the programs they write?

It's in this sense that I think return-codes are better than structured exception handling. It's much harder to use a return code badly.

2008-06-30 21:37:55 GMT | #Programming | Permalink
XML View Previous Posts