Wednesday, February 29, 2012

How many tests would you write for this?

Working through Michael Feathers' Working Effectively With Legacy Code, I find myself wondering how many tests I should write for this.



Naively, I only have a public method. I only need to write tests to validate that method's externally facing API, right?



That feels wrong, though. The private method is doing "interesting things." It's connecting to a web API to update a local cache of data. If the data that's expected is not there after the update,it's telling the API to create that data. Although I've split up work by the methods to provide an easy API for the client view, because the public method internally has different conditions that it performs differently under, I should have tests that stress those different paths:



  • When both user and card exist in local cache (already tested)

  • When a user exists in local cache, but no card

  • When a card exists in local cache, but no user

  • When neither exists in local cache


Since I'm only testing the controller, I should mock the datastore service in each case. Simple enough, though it seems rather surprising that such a seemingly simple method should have so many execution paths. I suspect there is a refactoring hidden here that makes things a bit cleaner. But I can't find it right now.


As I work through such scenarios, I'm starting to see why Uncle Bob Martin says Objects Should Do One Thing in Clean Code. It's easier to test small objects that do one thing with simple tests: did it do it? Did something within an expected set of "untowards" happen?


I'm also starting to see how judicious use of objects starts eliminating branching statements (using an Abstract Factory instead of a giant switch/case for instantiating objects, for example.) As logic is pushed out to self-contained objects that encapsulate it, dynamically configurable behavior (via run-time Dependency Injection) allows for piecing out the number of things that happen in one place and putting them under test in another. The amount of unit tests for individual components becomes small, and all you need is a few integration/functional tests to verify everything knits together correctly at runtime.


It also makes me wonder about the necessity of visibility modifiers(public, private, protected). Essentially, at small enough levels of work, those things perhaps become unnecessary. If private methods should usually be refactored into distinct objects that encapsulate a small piece of work and expose a public API (e.g. DemoController's private populateFromRemoteData() should be the PopulationService's public populate() method, then it almost becomes unnecessary to have visibiity modification.


Of course, dynamic programming languages have worked under this assumption forever. What if LISP had it right all along? The key is, one has to change one's very style of writing code to truly see this benefits. If one writes giant globs of functions (10+ lines, doing many things) in LISP, you're in the same place as you are with OOP. In which case, you need to have the ability to hide certain things, ignore certain things, just for overall system comprehension.


After all,one of the original purposes of David Parnas' treaty on Information Hiding was that it made software easier to understand, in the design sense. Encapsulation enables reasoning about a system because it abstracts away irrelevant details. Modules are designed to contain cohesive pieces of functionality that expose an interface, in the original Parnas paper.


One can argue that visibility modifiers mean that the imperative style took the wrong approach from his observation. One can argue that classes act as "mini-modules" when they have private methods that are internal and public methods that expose an external interface. Instead, the private methods should be public methods of separate sub components within the module.


This doesn't take into account the "access control" piece of the puzzle, the idea that there's certain invariant information that you don't want clients replacing at runtime because it breaks your functionality. Yet the use of reflection and metaprogramming is only increasing because of the way they dramatically simplify programming. These tools typically blow right past access control. For example, Groovy secretly doesn't respect private. It's a convention that programmers don't use metaprogramming to modify privates.


Perhaps that really is a case of "you break it, you bought it?" But then what are the implications for software security models?