Tuesday, December 28, 2010

The untested surgical team?

In reading Fred Brooks' seminal The Mythical Man Month, I found myself intrigued by his description of the "surgical team" programming group in chapter 3. Especially when he argues:

"Managers need to be sent to technical refresher courses, senior technical people to management training...Whenever talents permit, senior people must be kept technically and emotionally ready to manage groups or delight in building programs with their own hands...The whole notion of organizing surgical-type programming teams is a radical attack on this problem. It has the effect of making a senior man feel that he does not demean himself when he builds programs, and it attempts to remove the social obstacles that deprive him of that creative joy. Furthermore, that structure is designed to minimize the number of interfaces...It is really the long-run answer to the problem of flexible organization." (120)

This post does a pretty good job summarizing, though I think its conclusions are too strong. My response is:

I am reading The Mythical Man Month now, and was curious as to whether this approach has been successfully studied in the academic literature. Aside from Brooks' book, no Google Scholar papers seem to try it, which I find interesting.

I agree with you that, since 1972, many of these roles have been automated...but I think you may be quickly dismissing a few of them.

In every modern programming language (Python, Groovy, Ruby, C#, and yes, even Java and C++) there are idiomatic ways of writing certain snippets of code, and ways that work but are "sub-optimal."

For example, many modern techniques using LINQ in .NET can turn 10 line filter a list loops into one-liners. Under the assumption that fewer readable lines of code for the same functionality = greater maintainability, these optimizations would be well to strive for. Hence, a "language lawyer", i.e. That Guy That Knows the Language Inside-Out, may be useful.

I think, when it comes to tools, you're thinking of IDEs, etc. We have seen a dramatic rise of those. But most programming teams still have specialized libraries that are developed internally and used from project to project. For example, that one Active Directory security library that is written once and reused from project to project. I think this is more of what the Toolsmith is supposed to do, in terms of
"building specialized programming tools."

For example, in many modern projects I have the arduous task of migrating data from one system to another. This often involves developing a mediator schema or some type of transform. I know the right thing to do is to build a script, just in case I have to repeat the process or add some more data later. But if I'm focusing on Building Shiny New System X, the time I take to write a program to go from Old Piece of Crap Y to X is time taken away from polishing X. That's what a toolsmith would be for.

As for administrators, editors, and secretaries, I'm reminded of Joel Spolsky's work on the Development Abstraction Layer. I think such roles are important, for exactly the reasons he described.

Chief Programmer/co-pilot sounds very much like modern XP Pair Programming, with a very clear Sr/Jr relationship. While TDD/BDD may have them writing their own tests, one can argue that a Tester would still be beneficial for functional testing, and writing al that "glue code" to make testing easier (mock objects, dummy records, etc etc).

I guess my point is, while automation is definitely compelling, don't be so quick to dismiss his ideas. I find myself very surprised that this organizational strategy hasn't been put more to the test.

I would agree that program clerk is a relic of the date of publication, easily replaced by source control.

I find it curious that Brooks' assertions don't seem to have been put to the test in the literature since 1972. I wonder if any case studies have been done regarding this approach?

Also, regarding my assertion that "the assumption that fewer readable lines of code for the same functionality = greater maintainability", I remember studying metrics in Software Project, Process, and Quality Management at Arizona State University...but I don't ever remember running into a heuristic such as this.

Intuitively, it makes sense. A caveat must be made for Perl, of course, which can have one obnoxious line that does everything but is impossible to understand, hence why I included "readable" in the assertion. I wonder if something similar exists in the literature?

If not, it'd be fun to make this Vaidyanathan's Law. (Not to mention deliciously ironic, as I'm all about prolix tautology whereas the law argues for brevity! >:D)

Tuesday, December 21, 2010

Architecture as defined in Brooks' Aristocracy, Democracy, and Systems Design (The Mythical Man-Month)

"By the architecture of a system, I mean the complete and detailed specification of the user interface...The architect of the system, like the architect of a building, is the user's agent. It is his job to bring professional and technical knowledge to bear in the unalloyed interest of the user, as opposed to the interests of the salesman, the fabricator, etc." (45)

I don't know that I've ever seen a system's architecture defined that way before. It seems limiting to think of architecture solely as the user interface, but the idea of the architect as the user's agent is interesting.

"Architecture must be carefully distinguished from implementation. As Blaauw has said, 'Where architecture tells what happens, implementation tells how it is made to happen.' He gives as a simple example a clock, whose architecture consists of the face, the hands, and the winding knob. When a child has learned this architecture, he can tell time as easily from a wristwatch as from a church tower. The implementation, however, and its realization, describe what goes on inside the case--powering by and any of many mechanisms and accuracy control by any of many." (45)

A great example of the difference between architecture and implementation.

Programming teams as surgical teams

An interesting idea from Fred Brooks' The Mythical Man-Month:

"At computer society meetings one continually hears young programming managers assert that they favor a small, sharp team of first-class people, rather than a project with hundredsof programmers, and those by implication mediocre. So do we all. But this naive statement of the alternatives avoids the hard problem--how does one build large systems on a meaningful schedule?" (30)

"In one of their studies, Sackman, Erikson, and Grant were measuring performance of a group of experienced programmers. Within just this groug the ratios between best and worst performances averaged about 10:1 on productivity mesurements and an amazing 5:1 on program speed and space measurements! In short, the $20,000/year programmer may well be 10 times as productive as the $10,000/year one. The converse may be true, too. The data showed no correlation whatsoever between experience and performance. (I doubt if this is universally true.)" (30)

"I have earlier argued that the sheer number of minds to be coordinated affects the cost of the effort, for a major part of the cost is communication and correcting the ill effects of miscommunication (system debugging). This, too, suggests that one wants the system to be built by as few minds as possible...the problem with the small, sharp team concept: it is too slow for really big systems." (30-31)

"The dilemma is a cruel one. For efficiency and conceptual integrity, one prefers a few good minds doing design and construction. Yet for large systems one wants a way to bring considerable manpower to bear, so that the product can make a timely appearance. How can those two needs be reconciled?" (31)

"Mills proposes that each segment of a large job be tackled by a team, but that team be organized like a surgical team rather than a hog-butchering team. That is, instead of each member cutting away on the problem, one does the cutting and the others give him ever support that will enhance his effectiveness and productivity." (32)

"The surgeon. Mills calls him a chief programmer. He personally defines the functional and performance specifications, designs the program, codes it, tests it, and writes its documentation. He writes in a structured programming language such as PL/I, and has effective access to a computing system which not only runs his tests but also stores the various versions of his programs, allows easy file updating, and provides text editing for his documentation. He needs great talent, ten years experience, and considerable systems and application knowledge, whether in applied mathematics, business data handling, or whatever.

The copilot. He is the alter ego of the surgeon, able to do any part of the job, but is less experienced. His main function is to share in the design as a thinker, discussant, and evaluator. The surgeon tries ideas on him, but is not bound by his advice. The copilot often represents his team in discussions of function and interface with other teams. He knows all of the code intimately. He researches alternative design strategies. He obviously serves as insurance against disaster to the surgeon. He may even write code, but he is not responsible for any part of the code.

The administrator. The surgeon is boss, and he must have the last word on personnel, raises, space, and so on, but he must spend almost none of his time on these matters. Thus, he needs a professional administrator who handles money, people, space, and machines, and who interfaces with the administrative machinery of the rest of the organization. Baker suggests that the administrator has a full-time job only if the project has substantial legal, contractual, reporting, or financial requirements because of the user-producer relationship. Otherwise, one administrator can serve two teams.

The editor. The surgeon is responsible for generating the documentation--for maximum clarity he must write it. The is true of both external and internal descriptions. The editor, however, takes the draft or dictated manuscript produced by the surgeon and criticizes it, reworks it, provides it with references and bibliography, nurses it through several versions, and oversees the mechanism of production.

Two secretaries. The administrator and the editor will each need a secretary; the administrator's secretary will handle project correspondence and non-product files.

The program clerk. He is responsible for maintaining all the technical records of the team in a programming-product library. The clerk is trained as a secretary and has responsibility for both machine-readable and human-readable files. All computer input goes to the clerk, who logs and keys it if required. The output listings go back to him to be filed and indexed. The most recent runs of any model are kept in a status notebook; all previous ones are filed in a chronological archive. Absolutely vital to Mills' concept is the transformation of programming 'from private art to public practice' by making all the computer runs visible to all team members and identifying all programs and data as team property, not private property. The specialized function of the program clerk relieves programmers of clerical chores, systematizes and ensures proper performance of those oft-neglected chores, and enhances the team's most valuable asset--its work product. Clearly the concept as set forth above assumes batch runs. When interactive terminals are used, particularly those with no hard-copy output, the program clerks functions do not diminish, but they change. Now he logs all updates of team progress copies from private working copies, still handles all batch runs, and uses his own interactive facility to control the integrity and availability of the growing product.

The toolsmith. File-editing, text-editing, and interactive debugging services are now readily available, so that a team will rarely need its own machine and machine operating crew. But these services must be available with unquestionably satisfactory response and reliability; and the surgeon must be the sole judge of the adequacy of service available to him. He needs a toolsmith, responsible for ensuring this adequacy of basic service and for constructing, maintaining, and upgrading special tools--mostly interactive computer services--needed by his team. Each team will need its own toolsmith, regardless of the excellence and reliability of any centrally provided service, for his job is to see to the tools needed or wanted by his surgeon, without regard to any other team's needs. the tool-builder will often construct specialized utilities, cataloged procedures, macro libraries.

The tester. the surgeon will need a bank of suitable test cases for testing pieces of his work as he writes it, and then for testing the whole thing. The tester is therefore both an adversary who devises system test cases from the functional specs, and an assistant who devises test data for the day-by-day debugging. He would also plan testing sequences and set up scaffolding required for component tests.

The language lawyer. By the time Algol came along, people began to recognize that most computer installations have one or two people who delight in mastery of the intricacies of a programming language. And these experts turn out to be very useful and widely consulted. The talent here is rather different from that of the surgeon, who is primarily a system designer and who thinks representations. the language lawyer can find a neat and efficient way to use the language to do difficult, obscure, or tricky things. Often he will need to do small studies (two or three days) on good technique. One language lawyer can service two or three surgeons.

This, then, is how 10 people might contribute in a well-differentiated and specialized roles on a programming team built on the surgical model."

" (34)

"Notice in particular the differences between a team of two programmers conventionally organized and the surgeon-copilot team. First, in a conventional team the partners divide the work, and each is responsible for design and implementation of part of the work. In the surgical team, the surgeon and copilot are each cognizant of all of the design and all of the code...Second, in the conventional team the partners are equal, and the inevitable differences of judgment must be talked out or compromised. Since the work and resources are divided, the differences in judgment are confined to overall strategy and interfacing, but they are compounded by differences of interest...In the surgical team, there are no differences of interest, and differences of judgment are settled by the surgeon unilaterally."(34)

Obviously some of these concepts are dated and can be replaced by computer functions, such as the program clerk. But has this style of teach ever been effectively implemented in the literature?

Thursday, December 9, 2010

Bottom Up and Top Down Software Development

Software Engineering is a rapidly changing beast, as software itself, being made of "pure thought stuff" (Brooks, Mythical Man Month) rapidly changes. Not only has the process orientation of the industry changed, from cascading waterfalls to Agile methodologies (Agile Manifesto, Beck et al), but the design focus has changed.

In 1980s, design and architecture tended towards the development of large models that fully specified the software. The emphasis on modeling and architecture led to favoring code generation and a focus more on the high level "how things work." Tools like IBM Rational Rose were seen as holistic tools that could allow full specification of the software process from the ground up.

These tools have gained limited market share. As part of the Agile Manifesto's emphasis on "Working software over comprehensive documentation," the emphasis on system specification via UML modeling and up-front specification has given way to TDD development (Beck, Test Driven Development by Example). This code-first approach has found wide acclaim with software developers and industry, and even design is being driven by it's ramifications via the use of Mock Objects as a design tool. (Freeman et all, "Mock Roles Not objects" http://static.mockobjects.com/files/mockrolesnotobjects.pdf)

The basic TDD workflow is fairly simple: write a test that fails, write just enough code to make it pass, refactor and evolve. This approach has won acclaim because it breaks the intractible software system into manageable chunks. But why hasn't modeling received similar treatment?

In theory, it's just as easy to write a test and make it pass as to write a class diagram for a single class, then implement it. Write a failing test for a new method, add it to the class diagram, then add sequence diagram that stubs that class and that method.

Why isn't this done more? Is it poor support from the tools, i.e. it's hard to quickly create a modeling artifact and bind it to the project in the same way it is for tests? Is it lack of developer buy-in, i.e. models are completely unnecessary? Is it a perception of complexity of the modeling constructs themselves, i.e. the models are useful, but they're too complex and add little value?

Thursday, November 4, 2010

The horror, oh the horror part duh

Was given this method

Watch that. Numb your brain to it. Allow it to consume you. Enjoy the logical impossibilities of else if (hardForecastList.Last().Id == hardForecast.Id || softForecastList.Count() == 0) inside of a nested foreach loop that iterates across softForecastList in the outer layer. Enjoy, as you quickly come to realize that the only difference between these blocks of code are that 0s are getting set in certain places.

When I asked about the logic inside of the code, the response:

"The logic is behind that method is to display a hard and soft forecast as a single record that has the same dd, site, os, cycle and memory type.

So basically, you have a single list containing elements that need to be combined based upon a specified predicate. Algorithmically similar to a merge sort. This was the approach taken.

Some people code like this. Cry.

I turned it into this:

Which I know is still wrong. There's a 1-2 line solution in LINQ using Joins and the such out there. Curious, I asked on StackOverflow, trying to simplify the problem domain to the real thing I don't understand. But I'm not sure I like those solutions as much, either.

Monday, November 1, 2010

On the development of an automated privacy management scheme for relational database management systems

We can develop a methodology for automatically improving privacy protection in systems that store personal information by:

developing attributes/extensions into existing information systems that allow attributes(individual columns) to be marked as "personally identifiable"
developing triggers/stored procedures that log every access and modification of that information
developing triggers/stored procedures that check/verify proper access to that data
developing automated batch queries/scripts/jobs that remove personally identifiable stale information (hasn't been CRU for x amount of time) from a database
Adopting the "Convention over Configuration" approach to automatically apply this to certain types of records (i.e. Customer, User, Patient)

Wednesday, October 27, 2010

The Law of Demeter

Upon reading the excellent Head First Design Patterns the first time, I didn't quite understand the Law of Demeter. Their example with Thermometers and Weatherstation seemed contrived. It seemed like unnecessary indirection (though Computer Scientists do say the key to solving most computer science problems is to add another layer of indirection). But today, I found myself working on a problem where it kind of makes sense.

I have this method in my Service Layer


public bool SaveCycle(Cycle request)
   {
 bool returny;
       Cycle cycleToUpdate = cycleRepository.Get(request.Id) ?? request;
       PopulateCycleFromForm(cycleToUpdate, request);
       if (cycleToUpdate.IsValid())
       {
           cycleRepository.SaveOrUpdate(cycleToUpdate);
           returny = true;
           if (cycleToUpdate.CyclePartitions.Count == 0) // new cycles have no partitions associated. create default
           {
               CreateDefaultCyclePartitions(cycleToUpdate);
           }
       }
       else
       {
           returny = false;
       }
       return returny;
   }

and Cycle is a Domain Object in my object model. What made the Law of Demeter ring a bell was thinking about:


if (cycleToUpdate.CyclePartitions.Count == 0) // new cycles have no partitions associated. create default
           {
               CreateDefaultCyclePartitions(cycleToUpdate);
           }

I am storing CyclePartitions as a list of objects inside of Cycle in my domain layer. I need to create some default partitions on new cycle creation, so it seems reasonable to check the count of the collection.

But what if, 6 months down the line, I change the type of the CyclePartitions collection I'm using in my Cycle Domain Object to some other collection type? Maybe it exposes a Size property instead of a Count property. Then, I would have to go back and change the Service Layer as well, since I've subtly introduced a dependency here.

But, if I write this in my Service Layer:


if (cycleToUpdate.GetNumberOfCyclePartitions() == 0) // new cycles have no partitions associated. create default
           {
               CreateDefaultCyclePartitions(cycleToUpdate);
           }

Then I am free to do whatever I want inside of my Cycle Domain Object, as the Service Layer only cares about the API.

Monday, October 11, 2010

Security in Learning Systems

The security considerations of the intelligent tutoring system component of an interactive learning system are important, but not one of the critical pieces of the application. Security in a tutorial environment is not as critical as security in online credit card transactions or encrypted communications for national defense, for example. At the same time, a student may be rightfully upset if the details of their tutoring session are disclosed without their permission. An educator would also be annoyed if students were able to modify the results of their sessions to "game the system." by spoofing learning outcomes.

[hypothesis]
Within the continuum of the "ilities", security for an ITS will not be as important as usability, adaptability, maintainability, and performance. It may be as important as scalability, reliability, and testability. It falls within the third quartile of issues for an ITS.

[Idea. Create a survey with all of the "iliites" listed and send out to ITS domain experts]

Common security attacks are spoofing, denial of service, and direct applications --such as worms, viruses, trojan horses, and logic bombs--.

Spoofing is one of the types of attacks educators are most familiar with. It occurs when one party masquerades as another for the purposes of subverting security. For example, when a student Bill asks another student Tom to take a test for him. For another example, suppose teacher Mary attempts to log on to the administrative portion of the system, Tom notices her password, and later pretends to be Mary.

Spoofing can lead to incorrect interpretation of results, or it can give unauthorized users access to sensitive information. The spoofing concerns in an ITS are as follows:

Cheating -- student spoofs as another student in order to fool the system
Man-in-the-middle -- student intercepts interactions between student and system or teacher and system
Bullying -- student spoofs as teacher or system in order to fool another student
Penetration -- student spoofs as teacher in order to view/modify their own/other student's data
Destruction -- assailant spoofs as content author in order to load invalid knowledge into the knowledge base

In order to prevent spoofing, modern software applications use a variety of identity confirmation techniques. These can range from hardware and software based solutions to biometrics. These techniques are useful, but by themselves may not provide enough protection if the communication channel is not secure.
[make diagram]
The security of the communication channel can be strengthened by the use of secure standards, such as SSL. Additionally, the integrity of content can be verified through the use of checksums.

Denial of Service

Tuesday, September 28, 2010

Cross Browser Differences: File uploads

I've been working on an issue with file uploads not working properly when being passed to an ASP.NET MVC Controller in IE8. This code works fine in Firefox and Chrome, but strangeness happens in IE:


public FileUploadJsonResult ExcelImport()
       {
           FileUploadJsonResult result = new FileUploadJsonResult();
           HttpPostedFileBase hpf = Request.Files[0] as HttpPostedFileBase;
           if (hpf.ContentLength == 0)
               return new FileUploadJsonResult { Data = new { message = "File contained no data" } };
           String timeStampedFile = Path.GetFileName(hpf.FileName).Insert(hpf.FileName.IndexOf('.'),"_"+DateTime.Now.ToFileTimeUtc());
           string savedFileName = Path.Combine(AppDomain.CurrentDomain.BaseDirectory + "budgetImports", timeStampedFile);
           hpf.SaveAs(savedFileName);
           try
           {
               result = ProcessFile(savedFileName, Request["budgetId"]) as FileUploadJsonResult;
           }
           catch (ArgumentException e)
           {
               this.Response.StatusCode = 500;
               this.Response.StatusDescription = System.Net.HttpStatusCode.BadRequest.ToString();
               Response.Write(e.Message);
               result = Json(new { message = e.Message, stackTrace = e.StackTrace }) as FileUploadJsonResult; 
           }
           return result;
       }

Turns out, IE8 was sending the whole file path to the controller, as opposed to just the file name. This seems to contradict this post.

I'm working within a company intranet, so it may be a local policy. I'm also solving a fairly complicated use case using the jQuery Form plugin to do an AJAX file upload, which may play a factor. But if you run into errors with ajax file uploads seemingly working in FF and Chrome but mysteriously failing in IE, this may be why.

Thursday, September 23, 2010

Software engineering, law, and politics, rev 0.1

Modern political and legal systems have been relatively slow to adopt new technologies, considering the enormity of opportunity. Potential benefits include potential simplifications to process and workflow that would allow such systems to function more efficiently, convenience for end users, convenience for administrators, auditing and tracking capabilities, and more agile adaptation to citizen's needs.[insert sources here]

Imagine, if you will, if we could automatically parse the text of existing laws and find contradictions. These could be temporal discrepancies (e.g. law proposed in 2010 contradicts law passed in 2002,found automatically before law has chance to pass, lawsuit must be filed,etc etc) [find historical source], scope discrepancies (city law violates state or federal law) [find historical source], or even logical discrepancies [find historical source](law A implies behavior zeta is permissible but not behavior gamma, law B strengthens law A while adding clause mu, law C strikes B,keeps mu, but somehow allows gamma.)

Furthermore, such a system could proactively help "prune" legal systems by finding laws that "don't make sense", e.g. the stereotypical donkey in a bathtub [find historical source] and other archaic or arcane statutes. It could then put those measures up for vote, and citizens could proactively "remove the dross" to make legal systems leaner, easier to approach, and more receptive to the wants of the populace.

Such a system may seem far-fetched. It's not. Although the level of natural language processing necessary to evaluate some of the more semantic questions may be a little bit off, we can already do many of these things with already available tools. The existence of search engines such as Google show that although a computer can't really understand what the user is thinking,it can be leveraged as a tool to do very powerful things.

But do our political and legal systems need this type of fixing? Are current processes good enough? Most people would say no [insert sources here]. But are the potential benefits worth the potential costs?

Many people fear the proactive application of technology in the tasks related to the public space. For good reason. At it's worst, technology could be used to create a 1984-esque dystopian society that allows the ruling elite to fully monitor and collect information about everyone, which would enable elimination of dissidents and suppression of freedom. This would not be hard to do. With the rise of biometrics, RFID ,ubiquitous monitoring via omnipresent internet access and web cameras,etc, we already have the technology. [insert sources] Is it worth pursuing digitization and technological integration, when the costs could be so catastrophic?

As freedom and security are competing aims, societies that value both must make these decisions carefully. One could argue, however, that bloated systems rife with inefficiency and unnecessary complexity threaten both freedom and security. Processes that are unwieldy lead to more systemic failures. We have already seen this in software.

When software engineering first became a topic of discussion, in the 1960s[site paper that I forgot],it's main concern was the abysmal failure rate of software projects.[insert number that I forgot] Too many projects suffered from budget or scheduling overruns, if not outright failure, and better approaches were sought. Through the years, various models have been proposed.

What became the classical model is now known as "The Waterfall Model." [cite paper] In the waterfall model, a software project is partitioned into distinct phases: Requirements elicitation, Design, Implementation, Testing, Deployment,Maintenance. Each phase is to be completed fully up front, so that the software process always "flows down." Heavy documentation and record keeping is relied upon to transfer information between phases,and once a phase is complete it is to be "set in stone." It is ironic that this became the standard model, as the original author's intent was to argue against such a process. [cite source] In fact,the author argued, a better approach involved iterative refinements since it is usually impossible to get everything required up front in any but the smallest and simplest projects. Time has proven that criticism correct. In the 1970s/1980s,software engineering saw a move towards a Spiral model [cite paper], where each phase fed into each other and then looped back around at the end,iteratively, to develop quality software. This alone wasn't enough, and as the complexity of the software problem has increased the models have as well, moving to the Rational Unified Process[cite paper]and most recently Agile methodologies.[cite Agile Manifesto]

The key progression has been to "cut down red tape" in terms of heavy, unwieldy paperwork and processes that seek to "do it all in one go" and instead instill values of productivity and adaptation. It can be argued that some camps have gone too far with Agile, sacrificing discipline in the form of basic documentation and creating chaos[cite anti-Agile source], and software projects are nowhere near perfect [cite source],but Agile is clearly the direction the industry is moving in and studies suggest it is improving things.[cite source that shows improvement thanks to Agile]
This is not surprising, either. Given the rise of the internet and the growing inter-connectivity, being able to respond quickly is more important than ever before. This need will only increase with time.

One can argue that the United States' political and legal system currently follows a "waterfall" type of model. The current legal process is arduous, whether you're a defendant going to a court date to schedule another court date or a politician trying to pass an initiative. An explosion of structure has resulted in a morass of paperwork and processes that are not nimble enough to react to changing social conditions [cite source]. These processes are rife with opportunities for failure and corruption. [strengthen this a little more]

Furthermore, the products of "waterfall" style development tended to be large,monolithic pieces of software. Gigantic, complex, arcane laws are not only common, they are prevalent. For example, see The PATRIOT act [cite source],the Stimulus package [cite source], the Health Care Reform act[cite source], etc. The average citizen will not read a thousand page bill with hundreds of provisions, which are often written in difficult to decipher language(by design). These bills themselves are often littered with extraneous provisions that have nothing to do with the original intent of the bill, labeled "riders" or "pork." For examples, see the Safe Port Act-- which on an unrelated note crippled online gambling-- [cite source] and the recently failed Defense appropriations bill that included provisions to end "Don't Ask,Don't Tell"and the DREAM Act.[cite source]

This drives a software engineer crazy. Knowing basic concepts of how to represent things (data structures) and to get things done (algorithms),software engineers then learn the principles of effective systems design. An experienced engineer could describe the average current bill as one with low cohesion[describe cohesion]and tight coupling[describe coupling]. The process by which lawmakers create, inspect, and debate bills could be described as inefficient resource allocation that leads to deadlock [describe deadlock] and starvation[describe starvation].

But how did the system get this way? A good historical background can easily answer this question: by design and by the tension of conflicting forces.

The Federal Papers[cite source] and other assorted musings from the Founding Fathers, primarily the works of Madison and Jefferson, reveal a distrust of strong governmental structures. Checks and Balances were established for the very purpose of governmental inefficiency, with the thought being that an inefficient government that does nothing is still better than an efficient one that tyrannizes it's people.[cite source] This is not surprising, given their historical context and rationale for the Revolution, and can still be argued to be true given the even greater capacity for tyranny in the modern context.

The Jeffersonian argument favors a small, limited federal governmental structure that respects the rights of it's citizens to such esteem that it does little itself, delegates responsibility to the states for most actions, and presides over a limited and fairly specific set of cross-cutting concerns like international trade and militaristic mobilization. [cite sources. Possible paper idea: system architectural lessons learned from a historical analysis of governmental structures] Ironic, then, that Jefferson himself signed the Louisiana Purchase--the largest expansion of federal power and land mass in American history[http://americanhistory.about.com/od/thomasjefferson/a/tj_lapurchase.htm].

This may be due to influences from a competing system design, the Hamiltonian argument. Hamilton, seeing the potential for US expansion and growth, favored a stronger centralized federal government structure that handled a larger set of cross-cutting concerns to establish and grow the ability of the government including banking and cultivation of industry. A Hamiltonian style of design necessarily requires greater bureaucracy in order to do the "book-keeping" that handles the interconnected details of a more powerful system. Consequently he argued for an expansion of federal power.

The conflicting nature of these two ideals leading to lack of cohesion within the system, which has oscillated in different directions over the course of the years, combined with an original emphasis on balanced power, make it unsurprising that the system is difficult to modify, difficult to maintain, and difficult to evolve.

But as we've established, it is possible, advantageous, and ultimately necessary for systems to adapt and be modified in order to meet changing requirements. In the software world, we call this refactoring.[explain refactoring. Source] Systems that do not evolve become crushed by what Grady Booch terms "inertia" [source Podcast "On Architecture"] and "code rot."

The system must evolve, if it is to survive. But it must do so in carefully measured steps that follow logically and can be tested for equivalent fulfillment of requirements, if it is not to fall apart. Software can help meet these needs, but in order to do so it must ensure the fundamental pillars of the underlying political system

Participation -- Citizens must have the ability to make their voices heard
Collaboration -- Citizens must be able to unite and act in concert
Security -- Citizens must be protected from misuses of power,or attacks on the system from external forces
Conflict Resolution -- Citizens must be able to disagree lawfully without fear of reprisal
Reliability -- Citizens must be able to trust in the process

Transparency -- Citizens must be able to trust the accuracy of results via verification

[Ted talk http://www.youtube.com/TED#p/u/1/izddjAp_N4I]

[brainstorm more. Try to find supporting examples]

Luckily, software can do this. In fact,in many application domains--such as defense, avionics,and medical applications-- it makes guarantees similar to these and stronger. Consequently,it is not a question of whether software intensive systems can fulfill these requirements, but rather how they are to be composed.

Take the issue of voting, for example. Many electronic voting systems have been proposed,implemented,and analyzed since the 1960s, and this application area has received even more attention with the rise of the internet.[http://en.wikipedia.org/wiki/Electronic_voting] The reason is simple, voter turnout is relatively low on local,state, and national elections.[cite source] It is not nearly as low on online surveys and Facebook polls [cite source]. The most common complaint about the voting process as it stands today,whether it be in person at the polls or via mail, is inconvenience and ease of forgetting. Some would argue the civic duty is virtuous because it is not convenient and requires energy, but that is a philosophical/design discussion that doesn't mirror the reality of the general public's dissatisfaction that Washington isn't "hearing our voices." [cite source]

A system that could enable more accurate, secure,and reliable near real-time response in the form of votes to pressing issues for politicians from constituents could dramatically change the way discussions are framed. Such a system could be sufficiently easy and convenient for a voter to use, especially in these days of ubiquitous connectivity via web interfaces and mobile devices. This system could increase participation and involvement and offer both law-makers and citizens never before seen guarantees of authenticity, accuracy, and transparency.

People are nervous about implementing such a system for one major reason: security. There are many security considerations for such a system, including:
-- Anonymity : how can the system guarantee the safety of the voter from reprisal while logging the voter's vote for both tally and content
-- Robustness: how can the system handle usage without failing so votes aren't "lost",aka denial of service [explain denial-of-service]
-- Security: how can the system secure itself against man-in-the-middle attacks [explain man in the middle], spoofing [explain spoofing], playback [explain playback attacks]
-- Integrity: how can the system ensure logical constraints such as no voter voting twice, all votes being equal, all voters being authorized and authenticated [explain authorization and authentication]?
-- Correctness: how can the system ensure that the correct vote is cast? How can a voter verify their vote without anyone else being able to? How can an auditor tell that a person has voted,without being able to see what the vote was?
[list more. Read a few papers]

...This seems like a hard problem space. In many ways, it is. Yet it's not impossible. Many of these same constraints are faced every day in the processing of credit card transactions, biomedical data, and entertainment sources[see DRM] and other types of data. The potential failings of a system could be catastrophic if implemented incorrectly, but because the system is inherently a public good there is actually higher probability of it being correct [cite bug count analysis in terms of number and fixes of open source versus proprietary software of comparable size and functionality].

Let us propose, at a high level, the implementation of such a system.

Since the system is a public good and must be verifiable such that it inspires the confidence of it's citizens, an almost immediate initial requirement is that the system must be open source. Rather than having to rely on "unbiased mediators" (who may or may not be unbiased), citizens should be able to download the software and verify it's operation themselves. They should be able to verify that their is no logic that compromises their identity, discards their vote,or incorrectly tabulates the results. Luckily, this can be done fairly simply through the use of cryptographic hashing.

The software can be even more transparent and secure with good logging and audit output that echoes every line of code as it's about to be run,such that anyone who can read the language can analyze it.If the software is written in a good Domain Specific Language (which itself has a published implementation detail) then the code output can be readable to almost anyone.

In order to secure voter identity,a public/private key mechanism could be used with a small scale TPM stored on a person's driver's license.Perhaps integrate RFID. Or steganography with a 3D barcode. [elaborate]

Anonymity and confidentiality can be assured by creating an SSL tunnel between a citizen's computer and a web server. Perhaps using a Tor-style anonymous network to connect to the Front Controller that contains the public key of registered citizens such that it can provide authentication and verification before allowing access to a back end data structure that generates a new public/private key combo for the citizen when actually casting a vote that is used solely for correctness verification purposes. [elaborate more]

With such an architecture,we can provide all of the benefits of current voting systems while creating an even more transparent and auditable process.

[write strong conclusion]

Thursday, July 8, 2010

Pair Programming Coding Horror

Yesterday I had the pleasant experience of pair programming to "learn the code" from a lead developer. I use the term "pleasant" loosely. It's pretty hard when you're the intern and you see the things your "experienced mentor" are wrong. This becomes especialy trying when you try to call best practices to his attention and he uses excuses to excuse personal coding laziness such as:

"the customer wants it this way" (chances are he doesn't. the customer usually wants something that works, and doesn't care about the implementation detaills.)

"this works" (yeah, but how robust is it? secure? maintainable? modifiable?)

"I don't know what this does and I don't wanna deal with it." (when you're the lead developer on a project, this is a bad mentality.)

At one point I wanted to scream out "DON'T REPEAT YOURSELF!" and "DON'T LIVE WITH BROKEN WINDOWS!", but my good sense got the best of me. Still, I felt like William Shakespeare learning to write from John Grisham. At points I wanted to yell out "You knave! You scoundrel!"

Wednesday, May 26, 2010

The horror, oh the horror

I thought I'd seen horrible code before...then I had to figure out what happened that caused this working monstrosity to stop working when the site it was on was ported from Drupal 5 to Drupal 6. I think I'm going to lose a few hairs on this one.

Sunday, May 16, 2010

Groovy: duck-typing not always the best choice?

Working through Grails In Action chapters 3 & 4, I noticed something. The integration tests written to test domain classes and controllers use Groovy defs, presumably for simplicity.

The test cases look a lot like this:

void testFirstSaveEver() {

 def user = new User(userId:'joe', password:'secret')
    assertNotNull user.save() 
    assertNotNull user.id
def foundUser = User.get(user.id)     
assertEquals 'joe', foundUser.userId

}

I'm not going to go too deeply into the whole dynamic versus static typing debate, though I do have some verbose opinions on the matter. I was particularly intrigued with Groovy-- after learning Python and PHP and coming from a C++/Java/.NET background-- because it seems to be at the sweet spot of statically typed and dynamically typed. Being able to do both is a powerful thing.

For one, the type can serve as a form of documentation. Anyone looking at the code can tell what the original developer is expecting to receive when the type is declared, which may at times be easier/clearer/more telling than evaluating the expression on the right.

Additionally, I would expect that declaring the type at compile time would enable the Groovy interpreter to do some fancy things. Interpreted languages with dynamic typing get a bad wrap because they're slow. This is for a reason, as the duck typing resolution is not a free operation. I figure that if you can help the process along by telling the interpreter what the type is, when you know what the type should be, it should speed up the interpretation.

Curious, I decided to start writing the tests presented in the book, and tests that had statically typed variables instead of def. Like so:


void testFollowing(){
   def before = System.currentTimeMillis()
   def glen = new User(userId:'glen', password:'password').save()
   def peter = new User(userId:'peter', password:'password').save()
   def sven = new User(userId:'sven', password:'password').save()

   glen.addToFollowing(peter)
   glen.addToFollowing(sven)
   assertEquals 2, glen.following.size()
   sven.addToFollowing(peter)
   assertEquals 1, sven.following.size()
   def after = System.currentTimeMillis()
   println "User following using duck typing: " + (after-before)
}

void testTypedFollowing(){
   long before = System.currentTimeMillis()
   User glen = new User(userId:'glen', password:'password').save()
   User peter = new User(userId:'peter', password:'password').save()
   User sven = new User(userId:'sven', password:'password').save()

   glen.addToFollowing(peter)
   glen.addToFollowing(sven)
   assertEquals 2, glen.following.size()
   sven.addToFollowing(peter)
   assertEquals 1, sven.following.size()
   long after = System.currentTimeMillis()
   println "User following using static typing: " + (after-before)
}

Through the end of chapter 4, I've written quite a few of these tests. The output is shown below

Unit Test Results.

Designed for use with JUnit and Ant.

All Tests

Name	Time(s)
PostIntegrationTests	testFirstPost	0.188
PostIntegrationTests	testAccessingPosts	0.078
PostIntegrationTests	testPostWithTags	0.069
PostIntegrationTests	testTypedPostWithTags	0.050
QueryIntegrationTests	testBasicDynamicFinders	0.248
QueryIntegrationTests	testTypedBasicDynamicFinders	0.051
QueryIntegrationTests	testQueryByExample	0.052
QueryIntegrationTests	testTypedQueryByExample	0.040
TagIntegrationTests	testSomething	0.016
UserIntegrationTests	testBlankUserName	1.518
UserIntegrationTests	testFollowing	0.055
UserIntegrationTests	testTypedFollowing	0.033
UserIntegrationTests	testFirstSaveEver	0.022
UserIntegrationTests	testSaveAndUpdate	0.022
UserIntegrationTests	testSaveThenDelete	0.101
UserIntegrationTests	testEvilSave	0.046
UserIntegrationTests	testSaveEvilCorrected	0.029

In every comparison of typed versus def, the typed version has been faster. To verify, I also look at what I printed out to console:

--Output from testBasicDynamicFinders--
Duck typing testing dynamic finders: 230}
--Output from testTypedBasicDynamicFinders--
Typed testing dynamic finders: 41}
--Output from testQueryByExample--
Duck typing testing query by example: 42}
--Output from testTypedQueryByExample--
Static typing testing query by example: 30}

I guess that's kind of pointless. I just verify that JUnit tells time. :) The point being, however, that there seems to strong evidence that using the type when possible dramatically improves performance. The testBasicDynamicFinders method saw an almost 5x reduction in time of execution. This seems to completely contradict John Wilson's assertions that "Knowing the type of a parameter makes the call slower!" Of course, it was 4 years ago, but it looks like there was a good reason for blackdrag to investigate!

The question becomes: are there times when declaring the type is slower than def? Are these results valid, or is my system just nuts? If they are, should *hint hint* these portions of the book be rewritten in the next edition to spread best practice?

The Importance of Learning The Right Way

Related to my earlier thought on Test-Driven Development and why I struggle implementing it, I think it's important to learn how to code The Right Way.

What does that mean?

In the beginning, as young programming babies we do a lot of guessing and checking. We write loops that go out of bounds, don't really understand the difference between myClass->property and myClass.property in C++, etc etc As time progresses we discover tools and methodologies that make us more productive, like testing and design patterns. But, integrating these new techniques into our workflow is difficult. Oftentimes, for the sake of brevity, clarity, or lack of understanding, I fall back to my old approaches.

You can't teach an old dog new tricks

I'm not cynical enough to believe this is true, but as a life-long student and an academician this is an interesting topic.

I want to build high quality software. Software that not only meets requirements, but is comprehensible, maintainable, adaptable, reliable, secure, etc etc. But it's difficult to integrate best practices that one learns in a tutorial session with daily programming life without heavy repetition and rigorous use of those skills. Until using those skills becomes part of our pattern, we'll still find ourselves not writing tests or writing scriptlets.

The best way to avoid this integration challenge-- which one can analogize as changing the infrastructure of an application in the maintenance step in the software development life-cycle of the Software Engineering application of our brains-- is to start from a good foundation. Learn best practices as you're learning to program-- analogous to spending extra time in the Design phase, when refactoring costs are cheaper. Yet educating programmers in this way proves to be difficult for a variety of reasons.

First, for those programmers who are self-taught, it is difficult to know where to go for "best practice" information. Googling for code-snippets reveals a variety of tutorials written by developers of varying levels of expertise. Most of the code in these tutorials is not written at a production quality, high level. How many web-development tutorials properly sanitize their user inputs? How many tutorials use doubles for currency when it is better to use a more precise library? etc etc

Secondly, for programmers who learn to code in high school/college (like me), most introductory programming instructors are not well-versed in best practices. This is why, in many places, programming instruction remains pretty much the same today as it was in the 1970s. The programming languages have changed, the tools, etc etc...but essentially, the pedagogical methodology is the same. Learn about data types, variables, control flow statements, iteration, arrays, maybe do some OOP at the end.

I think these are mistakes. Best practices should be integrated throughout. The best time to pick up good habits is when you're learning.

One can argue, of course, that there is a sweet spot. Putting too much detail about bounds checking and security considerations in an example about arrays may obfuscate the clarity of the example. You don't want to digress into unrelated topics, too much code is confusing, etc. I would refute the argument by placing responsibility for good explanation of the example to the educator. Line by line, point by point.

A good programmer may be dumb and lazy. A good educator can't be.

Grails in Action -- Initial thoughts

I've been working through Manning's excellent Grails in Action book to strengthen my skills in Grails. I've been away from heavy duty programming for a while, maintaining poorly made sites in Drupal has made my development skills rusty. So far (partway through Chapter 2), I'm happy with the results.

I like this book so much for a few reasons.

Good humor. The Quote Of The Day application in chapter 1 starts it off right, and sprinkled references to Chuck Norris keep the material fresh
You build a really cool app -- I'm not all the way through yet, but it looks like Hubbub is essentially Twitter built on Grails. Not a Twitter client. Twitter itself. That's a pretty sweet app to build as a learning tool.
Development the right way -- I've been learning about modern practices in web development like Test-Driven Development for a little while...yet when it comes to building new software, I often find myself falling back into my usual patterns of "just code and see what happens", sprinkling System.out.println() everywhere. I hypothesize that this is because this is the way I learned to code, in high school and in college. Old habits are hard to break. This book promises to devote proper time to testing in Chapter 7, which at first seems a little late. But then even in the chapter 2, domain modeling, it has you write integration tests for your domain classes right after you write them.

As I continue to work through the book, I'll reflect on my thoughts.

Software Meditations