Refactoring is improving code without changing the features it implements.
If you’re refactoring, you’re not fixing bugs, you’re not improving performance and you not increasing robustness. Refactoring is simply improving the design of the code, while ensuring that it still works the same, warts and all.
Pointy haired bosses the world over froth at the mouth to hear such things. Veins pop out in their temples. You mean the business value of the software stays the same but the cost to the business goes up? I didn’t say that.
If you measure business value only in terms of features you have today, then you can end up deep in technical debt; you can add features today in such a way that features tomorrow cost more and more. The value of refactoring is wholly contained in the future ability of programmers to comprehend and modify the code. It’s called maintainability, but that’s a boring word, so let’s call it Agility. Mmmm, sexy. Well-factored code is agile code because it’s better able to change.
In economic terms, refactoring is an investment, or the repayment of a debt. It’s only worth doing over a time frame when the interest payments or repayments (in the form of ongoing productivity gains) compound to exceed the time invested. Fingers crossed the business or project sponsor is also planning over such time frames.
The term refactoring comes from mathematics. You may remember your high school algebra:
2x2 + 10x
Stay with me! No glazing over! If you refactor the expression, extracting the common factor, 2x, you get:
2x . (x+5)
Sometimes it’s hard to spot the common factors, in both mathematics and programming, and it can certainly be done poorly. More on that later.
Like most powerful techniques in software development, the purpose of refactoring is controlling complexity.
Complexity is bad, mmmkay. Complexity is evil.
I was fortunate enough to be chatting about project complexity with the legendary Dave Thomas (OTI, Eclipse) at JAOO Sydney in May. He nailed it: “kLOC kills”. Complexity and scale in codebases is a major contributor to schedule blowouts, poor velocity and excessive development cost. Complexity is a kitten killer from way back.
Fred Brooks discusses two categories of complexity. Essential complexity and accidental complexity.
Essential complexity is the complexity of the domain. In NASA software, there’s no escaping rocket science. You can isolate and divide essential complexity but you can never remove it. Essential complexity belongs to the problem. By contrast, accidental complexity is an artefact of the systems, languages, frameworks you’re using. In principle it can be reduced by changing the system. Accidental complexity belongs to the solution. Refactoring reduces accidental complexity.
If you don’t have much experience with it and you’re looking for some concrete tutorials on refactoring, I suggest you start with Martin Fowler’s seminal book Refactoring. Fowler also maintains a catalog of refactoring recipes with an Object Oriented flavour.
One of the most basic techniques is Extract Method which all decent IDEs can do automatically. You know you need Extract Method if you have a multi-page method with a sequence of of comment blocks which look like this: Now that we have the InductionActuator, look up the FluxCapacitor…. Doing it manually means snipping out a logical sequence of code and pasting it into a new, small, well-named method, stitching the local variables used from the originating context into parameters to the method. If this is hard due to sloppy scoping or too many variables, you may consider Introducing a Field from a local variable.
The inner loop of agile development should go like this: Red, Green, Refactor. Red means you have a test which is not passing. Getting the test to pass is the next step. Green means you are passing all tests. Refactor means… refactor.
Even if you’re a good agile developer, doing things as simply as possible, complexity and duplication of common factors creeps in while you’re trying to pass tests. Everybody hacks. Everyone copies and pastes. This is fine as long as you go back and refactor when you’ve got the green bar. Sometimes you may need to avoid mentioning this to PHBs, for their own good. Shhh!
Unit tests are really important for refactoring. If you’re not doing unit testing you’ve got a long way to go. A good unit test suite is a necessary precondition for confident, aggressive refactoring. And IMHO a good type system is a necessary precondition for confident, aggressive, automated refactoring. These preconditions can present a quandary for some developers. Legacy systems often have no effective automated tests. And since they’re often composed entirely of spaghetti, they need to be refactored. It’s a chicken and egg situation, where do you start? All I can say here is you start small.
Can you have too much refactoring? Absolutely. If you’re somewhere around middle-stage zealotry for this refactoring stuff, you may not be in danger of copy+pasting your way to a big ball of mud, but you may fall prone to exceed the safe working abstraction load of your language or go too far beyond the idioms of your team’s codebase or comfort.
Every language has limits imposed by its design and implementation. In Java and C#, for example, the limits are seen by many in the dynamic languages camps to be too much to bear. For example, say you’re refactoring some Java or C# code. You might create a new interface with a few alternate concrete implementations and, whereas before you had two methods on a concrete class and a few big if-else blocks, after refactoring you might have three files and more actual lines of source code. It can be somewhat subjective but sometimes you may have more complexity even though you’ve removed duplication!
If this happens you have fallen asleep on the refactoring train and missed your station. Often you should just roll back the code and go write a feature. Some duplication is easy to see and cope with, especially if it can fit on one screen and any reader can see the pattern. In other languages, Lisp comes to mind, there are constructs (like macros) which allow you to encapsulate expressions that cannot be elegantly factored in, say, Java. Disclaimer: IANALN; I Am Not A Lisp Nerd.
So the expressiveness of the language can constrain refactorability. Another way of saying this is that the language contains accidental complexity and only factoring out the language can remove that complexity. I should say here that I have recently found Groovy to be a great candidate for doing this on Java projects.
As a more concrete example, lexical closures are a great way of implementing things like the new for loop (for each) introduced in Java 1.5 and functor frameworks that employ anonymous inner classes in Java for similar purposes (e.g. composable transformers, ad hoc iterator delegators instead of explicit looping) often feel too cumbersome compared to most closure implementations. So you just have to suffer the duplication and code bulk.
So in summary, Red, Green, Refactor, don’t go overboard and be aware when your language makes capturing factors you see in your system worse. Kill complexity before it kills you.
If you’re interested I’ll be telling war stories and going into some side issues over on my personal blog, Chris Mountford.