Software Meditations: April 2012

Hey List,
I'm working with a simulations company that wants to analyze, document, and convert some VB Legacy code to C#. The documentation process for the original software is already well underway. (I've been working on it for the past week and some.) At first I was under the impression that they wanted me to update the original VB 6 code to VB 10 - for readability and to optimize the current version of the software - then convert everything to C# for the new platform. However, after analyzing the modules I've been given so far (documenting most of them) the software developers and I have come to varying conclusions regarding the quick and effective completion of both the documentation and conversion processes before our deadline. The developers aren't sure how to go about this on account of the fact that they've only been updating the original source code - never analyzing or modifying it completely. They've recognized the need for a complete overhaul in order to deploy the product on more platforms for some time and have just begun. We seem to be leaning towards analysis, conversion, verification, then documentation. Our deadline for project completion is approximately six months away.
Planning aside, one particular contingency we're encountering is the variation in data [and subsequently, file] outputs during code conversion for the same algorithms. The simulation will not function correctly on the new platform if this persists.
The question I have for those of you with experience in this process is, how can the developers and I go about this in a way that allows us to verify each step without spending too much time on documentation and analysis of the original source code? Any suggestions would be awesome. We're likely to decide within the next 48 hours and go from there.

My response

We seem to be leaning towards analysis, conversion, verification, then documentation. Our deadline for project completion is approximately six months away.
The process you're describing is probably sub-optimal. Chances are high you're going to use a Tiger Team to re-architect a Stovepipe System. This is fairly common in green field development, and chances are after about 8 months it'll look as hairy as the current system, just with newer syntax.
I'd advise a more disciplined approach.
Start writing learning tests for the existing system, using those as documentation and specification baselines for the new system.

Read Working Effectively With Legacy Code and TDD:By Example concurrently.
Try using SpecFlow to write the specs. Use source control and a continuous integration tool like Jenkins to make sure you're not breaking existing builds as you refactor pieces of the system.
Since you're seeking platform independence, at some point you'll need to re-architect the system to not make as many implicit assumptions about the model. You'll want to push framework specific interfaces (file I/O, DBs, web services, etc) into your own internally defined APIs. These will show you the way:
Clean Code and Domain Driven Design Head First Design Patterns
You can probably encapsulate the algorithms with a combination of Strategy and Template Method, using Abstract Factory to put related pieces together.
tl;dr: Don't use the Waterfall, what you're describing. You need to start manifesting the existing behavior of the system in executable code that can serve as the baseline for the new system. This is a well-studied problem. Start reading the literature about how to evolve the Big Ball of Mud, and avoid Stovepipe or Wolf Ticket solutions.

Helpful feedback from a stubborn and argumentative friend:

I know I'm wasting my time and I would much rather have an in person discussion about this sometime but oh well. I might/will come across as rude but that's just the way I am, take no offense, I'm blunt and I appreciate bluntness myself. I apologize in advance.

I think you're a little crazy about your design patterns and software development process philosophy.

Bear with me for a minute. Try to imagine someone like me who has never read any design pattern book or resource actually describing them etc. reading your email (and not just this one but several you've sent to the list in the past . . . this one was just particularly .. . noteworthy)

Even better, imagine someone who has never even heard of design patterns and isn't a software developer/programmer at all. No background in it whatsoever.

Got that state of mind?

"Tiger Team", "Stovepipe System", "Green field development" "Strategy and Template System", "Abstract Factory" "Big ball of Mud", "Wolf Ticket"?

Are we still talking about coding or is this some sort of weird elementary school word game?

I hear stuff like this and I am agog.

Let me give you a few links of my own that I found in about 2 seconds of googling.

From
"It's certainly worthwhile for every programmer to read Design Patterns at least once, if only to learn the shared vocabulary of common patterns. But I have two specific issues with the book:

Design patterns are a form of complexity. As with all complexity, I'd rather see developers focus on simpler solutions before going straight to a complex recipe of design patterns.
If you find yourself frequently writing a bunch of boilerplate design pattern code to deal with a "recurring design problem", that's not good engineering-- it's a sign that your language is fundamentally broken.

In his presentation "Design Patterns" Aren't, Mark Dominus says the "Design Patterns" solution is to turn the programmer into a fancy macro processor. I don't want to put words in Mark's mouth, but I think he agrees with at least one of my criticisms.

"

I don't always agree with Attwood/coding horror and in fact sometimes I disagree completely but in this case I agree wholeheartedly. (another poast about design patterns that I don't agree on is that I don't think they're missing language features as paul graham thinks

Design Patterns are not a Silver Bullet

I could go on but I'll just add a quote/paraphrase from Bjarne Stroustrup

Inheritance is one of the most overused and misused C++ features.

Since even the design pattern book itself says they're about the arrangement of objects and classes, if OO is not the best thing for a problem design patterns are automatically not directly relevant.

Object Oriented programming (with a capitals OO) is not always the answer and in fact I think the Java extreme everything is an object etc. is actually extremely bad and people try to do the same type of design in C++ and it's terrible.

I'm not saying I've never used (inadvertently and far from the way you would have implemented it) design patterns of some kind. I will say it's probably far less since I am a C/C++ programmer and I tend to work on lower level non-gui/non-user inteface/database things like 3D graphics and low level IO etc. I don't think I've ever used inheritance in my own (non school required) code. I tend to use concrete type classes (ie vector and matrix types with overloaded operators etc.) and composition occasionally. I think it's better to have a few specialized classes than to try to generalize or create an inheritance tree that obfuscates the purpose and bloats the code/makes it slower etc. Also I have no problem with global variables/data, especially in my personal/prototype programs. Don't see any problem in general either as long as the main.cpp file isn't cluttered.

Actually I've yet to use inheritance in code written at work (Mars Space Flight Facility on campus where I worked on a large old C project ;), then Intel where I wrote some C testing code and mostly set up VMs crap and now at Celestech. I admit that the project I'm working on which is Internal R&D and I'm the sole developer already had some inheritance in it because it's a Qt project and Gui programming is one place where even I agree some inheritance is good/necessary . . . but again remember everything in a modern language could be written in C. Actually I think Linus has a point and C is great because it's simple and small and easy to understand/minimize complexity but he goes overboard and C++ is a great language that he just was exposed to before it was standardized/implemented fully.

If I were to argue some design/development methodology in contrast to yours it would be KISS/YAGNI (just learned the YAGNI acronym today but I've always believed it about over design). Also make it work make it right make it fast. where I define "right" and I iterate in an extreme/agile fashion more or less if I need to.

Finally, this:

" We seem to be leaning towards analysis, conversion, verification, then documentation. Our deadline for project completion is approximately six months away. "

You say this is suboptimal and waterfally. I couldn't disagree more. It may use waterfally sounding terms but what they've described is the most basic/straightforward/easy way to port/rewrite code.

read and understand it, rewrite it on the new platform/language, make sure it gets the same output and then document it. How is that not the most fundamental way to do it?

Also nowhere in the email does it say the original code is bad (besides the inherent badness of VB maybe) or badly designed etc.. Legacy doesn't necessarily mean bad/evil/badly designed code. The linux kernel is 20 years old and parts of it have hardly changed or haven't at all since the beginning. There are other examples.

the process you described, in my opinion, adds a ton of unnecessary work and complication to a simple process. The only thing you've mentioned that I would agree with for some projects is Jenkins/Hudson. But that is only useful for large, ongoing projects, not something like this, a simple rewrite/port.

Waterfall is CSE 360 where you waste over half the semester designing and documenting and creating sequence diagrams and UML diagrams and crap before you've written a line of code so they're all completely made up and useless..

Anyway, again I apologize and don't take offense. I know you're a good developer just very different with a very different coding experience.

To deconstruct every one of the arguments would be long and tedious, as it basically boils down to

I don't like your heavy use of professional jargon/terminology.

This is fair. I probably overdid it a bit, especially given the audience of the list. I was trying to use a shared lexicon to keep the transmission short, as I'm typically accused of writing too much. But in my haste to transmit a short message and provide directed pointers, I came off pedantic and "overzealous."

Because I don't understand the terminology, you must be wrong

Obviously he hadn't bothered to look up Legacy Code (equating age with Legacy is imprecise), or "Template Method" as opposed to "Template System", etc etc. The dangers of trying to rely on a shared lexicon that's not ubiquitous rears its ugly head. But because he'd written code at a university, Intel, and other places, and it had worked, he must know a better way of writing code. "Because it works, it is correct and good."

We can build cars out of matchsticks and make them run. Does that mean all cars should be made of matchsticks?

We can build bridges with thin metal plates. Does that mean we should build all bridges that way?

OOP is not that great. I write C code every day, it's more performant and better and awesome.

C is not that great. You don't need function pointers and loops. You'd be much more performant in assembly.

Anything that can be written in a higher level language can be written in assembly. So go! Load words into registers. Go twiddle your bits because you'll be faster and your code will have simpler pieces. After all, who the hell understands pointers? Just load memory addresses into registers!

While the GoF book and much of the attendant literature has revolved around OO, it's naive and foolish to discard Patterns as an OO concept. Functional Programming is making great strides in uncovering Monads, such as the State Monad and File I/O, that serve similar purposes: the solution to a problem in context. Agent-Oriented artificial intelligence systems[this][that] are drawing great inspiration from patterns. All of science is converging to the idea that patterns are fundamental. Analysis Patterns, Implementation Patterns, Compiler Patterns, all of these are applicable whether the program is written in an OO, procedural, or functional style. Evans talks about this in Domain Driven Design as well.

C++ was perfectly fine with function pointers and multiple inheritance. Why did C++0x add lambdas? Because better tools for the job allow us to do our job more efficiently. That's the only reason to ever add to a language, because the increased expressivity and conciseness improve clarity. Rejecting the use of a shared vocabulary because it "adds cognitive overhead" is dubious.

Linux doesn't need all this junk. It's simple and awesome.

Curious, I asked a friend about an anecdote he'd shared with me.

I recently got into an interesting argument about quality code and its manifestations in various paradigms, with a guy who essentially argued that Design Patterns and TDD are too complex, the simplest thing is to rewrite a system from scratch and document it thoroughly, and that Linux was the bee's knees in quality and proof you didn't need things like tests to write good code. I'd like to reference your wireless card story and the post you found by Linus Torvalds. Could you try finding that again and posting it on my wall?

Found it, I'll post it. I'd argue that Linux is proof of a different concept, which gives it an advantage over how TDD is used in the real world: "with enough eyes, all bugs look shallow." Linux is remarkably stable without automatic tests, but this is because of the massive number of people who run test releases and the huge number of people looking at bug reports. With that user and dev base, any problem that's found will probably have an obvious solution to somebody. Tests would still help from time to time, but Linux can squeak by without them in ways that software developed by a small team never could. One could ask, which is more complicated: design patterns and TDD, or building a dev base hundreds of thousands strong and a user base millions strong that's willing to accept major bugs?

I really think this marvelous piece from his post says it all.

Linus Torvalds | 12 Jan 06:20

Re: brcm80211 breakage..

On Wed, Jan 11, 2012 at 8:15 PM, Larry Finger lwfinger.net> wrote:
>
> I see no difference in the core revisions, etc. to explain why mine should
> work, and yours fail.

Maybe your BIOS firmware sets things up, and the Apple Macbook Air
doesn't? And the driver used to initialize things sufficiently, and
the changes have broken that?

Apple is famous for being contrary. They tend to wire things up oddly,
they don't initialize things in the BIOS (they don't have a BIOS at
all, they use EFI, but even there they use their own abortion of an
EFI rather than what everybody else does), yadda yadda.

But the real point is: it used to work, and now it doesn't. This needs
to get fixed, or it will get reverted.

Linus

I'm not trying to troll Linux or be a jackass, but every time a neckbeard tells me how awesome Linux is and it doesn't need all these "Silver Bullets", I wanna refer them back to this thread. If the patch committer had had an automated regression suite that he could've run before committing, and specific pieces like the MacBook Air wireless driver initialization had been mocked out such that it could've red barred, then the Inventor of Linux's day wouldn't have been ruined on something like this when he OBVIOUSLY has better things to do.

This speaks to a larger point. A sociological one. I replied to my friend as such.

I'd argue that Linux is a byproduct of its time.

Unix could've been written in LISP, but the hardware wasn't performant enough yet at the time to handle the case. In the same way that we could build our cars out of carbon nanotubes, except it's too expensive. Different technologies are appropriate at different times. Iron probably existed in the Bronze Age, but it was too expensive and not readily available enough to build weapons, armor, and pottery out of.

So some very clever hackers used the best tools they had at the time, building languages off of FORTRAN like B and C, and made tools out of them. These tools got agglomerated, and eventually became an operating system. It works, and it's good. But "it works" does not necessarily imply optimal.

Linux was built off of Unix because the most popular OSes when Torvalds wrote it were Windows and Macintosh. Which had provable security and stability issues. Unix had a vibrant but small use base in academia, military, etc...Torvalds made it more available for commercial use.

Social attitudes change with time. Manifest Destiny was a popular American mantra in the 1800s. In modern times, were new land to be found and attempted to be colonized even though indigenous tribes lived on it, it is unlikely that the idea of "enlighten the Noble Savage" or "drive the wild animals off the land" would be as socially acceptable.

Ways of doing things change, as well. Henry Ford went very far with the assembly line. The American car companies were very happy with their production model all through WWII, the 1960s, the 1970s...and their lunch got eaten in the 1980s/90s/00s. The Japanese improved on the process models with Lean Manufacturing, Six Sigma, and applying effort to Eliminate Waste and improve efficiency.

One can argue that TDD is not new. REPLs have been around for decades. But there's a difference between writing a program by typing a little bit in REPL until your code works, then throwing your micro-experiments/ code doodles away and using the working code as the finished product, versus storing those "doodlings" as executable tests that assert behavior--which can act as specifications, check boundary conditions, and even influence the design process.

A program, at the end, is just the solved equation. The regression test suite and commit log history shows your work. It stores the Tribal Memory that went into your code as code. It's really hard to overstate how important that is.

Why does your teacher at school not accept assignments that don't show their work? Because the process for arriving at a solution is as important as the solution! We're not talking about "daily standups" and "ScrumMasters" or anything here. We're talking about the tools you used for the job, and the way you applied those tools to solve the problem.

Design Patterns have always existed. Identification and codification of them is merely seeking to create a language to express complex ideas more succinctly. A lot of people believe they're "unnecessary complexity", and make arguments to keep it simple. Ask them:

Do they believe in loops? You don't need a loop structure. You could just write the same code 10 times.

Do they believe in methods? You don't need methods. You could just write it all in int main().

Do they believe in classes? You don't need classes. You could just write it all in one file.

Do they believe in modules? You don't need modules. You could just put all your classes in one place.

Do they believe in inheritance and polymorphism? You don't need those words. You could just say "you write a class that extends another class, and it gets all of the properties and methods of the class it extends." or "you can treat multiple different classes as if they're the same."

So if they believe in all of these things, what's so hard about saying Template Method as opposed to "I write a class where I can define the outline of algorithm and replace specific operations dynamically by extending it and proving a new implementation?"

Your point is well taken. Linux "works" because it has a wide user base and leverages Amazon's Mechanical Turk model or Google PigeonRank.

But is working "good enough?" Or does the community actively accept stagnation if it doesn't seek to incorporate new process models into its ecosystem? It didn't work out that well for American auto. Why should it work well for Linux?

People who decry the advancement of programming knowledge as "old wine in new bottles" and Silver Bullets don't understand that old systems are like Zombies that eat away at our brainpower.

I had a physics research faculty member recently tell me that he had PhD students spend 6 months trying to add a new feature to a piece of code for his dissertation. The student ultimately failed. The feature was ultimately augmenting a method in a deep, nested inheritance chain. We can write code better.

I've heard similar stories that often researchers write papers on works that are published in leading scientific journals based on code simulations that crash 30% of the time. The published results that are advancing the frontiers of science could be repeatable or they could be a random bug.

Calls for utilization of 5 billion cores by the federal government for sequencing genes to do things like fight cancer often have hundreds of hours allocated to programs that may crash half-way through with array indexing issues, may thrash their way through their alloted time.

But maybe it's not a problem. Maybe this really is self-evident and we don't need such "Silver Bullets."

Is it clear to you that PermutationEstimator.cpp's evaluate method uses an array to represent the traversal of a Graph data structure, where each node is the array index and the next node to go to is the value? That evaluate() is using the Hungarian Algorithm to estimate the lowest cost permutation?

But it is good work. After all, it does work. Is that really how we wanna leave it working?

68% of all software projects fail. Treating modern ways of writing software that seek to make it clearer to read and understand, tease apart dependencies, capture its behavior through automated and repeatable methods, and to overall evolve not just our tools but our very mindsets as "Silver Bullets" is to accept that such is good enough, and should continue. Design Patterns are most certainly not Silver Bullets. They shouldn't be regarded as the answer to every problem. That just leads to Golden Hammer. ;P

Software Meditations

Monday, April 16, 2012

TDD and Design Patterns are Just Silver Bullets, *IX doesn't need 'em! Also, Bronze > Iron

About Me

Blog Archive