Monday, 05 February, 2007

What is shit code? Developers always go on about "crappy code" and yet when pushed it is hard to get these people to define what exactly they mean when they say code is crappy. In fact, the opposite is also true, are these people capable of coming up with a set of rules that produce objectively good code?

I want to be a heretic and argue that code that is considered almost universally bad is actually rather good. Before I do this, I want to define what I mean when I say code is good. This will hopefully help you when we treat our coding horror later on.

According to the gospel of Simon, good code has the following properties:

These attributes are in no particular order of importance. Different people have different ideas about which of these attributes are more important than other - indeed they can even vary between projects.

What do you make of the following code snippet:

public static Account FindByMobile(string mobilePhoneNumber)
{
	if (mobilePhoneNumber == null)
	{
		throw new ArgumentNullException("mobilePhoneNumber");
	}

	Account accountFound = null;

	for(int i=0; i<Account.AllAccounts.Count; i++)
	{
		if (Account.AllAccounts[i].MobilePhoneNumber == mobilePhoneNumber)
		{
			accountFound = Account.AllAccounts[i];
			i=Account.AllAccount.Count;
		}
	}	
	
	return accountFound;	
}

The aim of this method is to return the first account which matches the mobile phone number passed to the method. Most people would say this is a pretty horrific piece of code. It commits the cardinal sin of the for-loop: Jimmying the loop counter to exit the loop. But let's look at the code again using my objective measurements.

Most people would clean up this code so that it returns the account as soon as it found. If the loop happens to finish and no account is found, you'd just return null at the end of the method but is this radically better than what we've already got? Let's see..

Correctness

It's pretty easy to see that the algorithm is provably correct. The loop will always finish when there is a finite number of accounts, so it will always return something. accountFound will only be assigned a reference if and only if an account is found with the mobile number in question. So the algorithm will only ever return an account where the mobile phone matches or a NULL if no account is found.

Maintenance

Let's take a hypothetical change where we want to generalise this method so that it becomes "FindByPhoneNumber". Rather than searching on field, it searches on both the land-line and mobile phone numbers. What do we have to change to do this? All we have to change is the condition which is a few extra characters of code.

Any other changes, like validating the phone number or what not, don't really affect the smelly part of the code. The difficulty of making the change would be the same if we cleaned up the method.

If I got my mathematics right the McCabe complexity of the method is four. Having a low McCabe complexity is a good indicator of maintainability and four is a suitably small value.

I would say all this makes the method easy to maintain.

Portability

The method would compile with Mono and the Microsoft C# compiler. It would also be easy to move to a language like Java or even C. The code is portable.

Testability

You would require three unit tests to exercise each code-path in the method. This is an acceptably small number of tests. The code is easy to test.

Efficiency

There are many ways to rewrite this method so that it is much faster but you would have to that at the expense of some of the other attributes. In general, it isn't worth doing this unless the algorithm forms a detectable bottleneck.

If we cleaned up this code, we would still have fundamentally the same algorithm and it would take just as long to complete.

Conclusion

What point am I trying to make here? Well, this example is what most people would consider a "Coding Horror" but on closer inspection it isn't actually that bad. Yet developers split hairs continually over much less serious deficiencies.

Personally, I have my own pet theory on why people have religious wars over coding styles, development methodologies or the choice of language. It's all about feeling powerless in the face of bugs. Everybody who writes code has written code that contains bugs. It's a fact of programming life and there's good reasons to think that it always will be. I suspect that the way people try to reconcile themselves with the fact they produce buggy code is to try and micromanage the bugs out of their software.

I can imagine the thought processes associated with each camp:

It probably isn't fair to write all of these off as mere comfort blankets. Indeed, the stuff I've seen with Spec# leads me to think that language innovation may consign whole classes of bugs to history.

But to think in terms of single advances is to miss the point. Bugs exist because taking a program from a fussy business requirements document down to the nitty gritty of the predicate logic that programs are built from always has a subjective element to it.

In short, we know that quality is basically subjective. Every developer thinks the way they write code is the best way, otherwise they wouldn't write it that way. I've recently started to think the best way to write code is to try and get a proof it is correct first before you consider anything else.

Getting a proof of correctness is hard and it demands the sort of discipline that is conducive to good quality software. For example, it is generally easier to prove a method is correct when it is short and has a small McCabe complexity. Just by lowering the McCabe complexity you can reduce the bugginess of your code by a couple of orders of magnitude.

Indeed, by focusing on lowering McCabe complexity at HP, they managed to release a 125,000 line program that had a defect rate of 0.02 defects per thousand lines of code. This is about one hundredth of the rate of most commercial software. Impressive stuff I think you'll agree.

Simon.

22:17:21 GMT | #Programming | Permalink
XML View Previous Posts