Wednesday, October 27, 2010

The Law of Demeter

Upon reading the excellent Head First Design Patterns the first time, I didn't quite understand the Law of Demeter. Their example with Thermometers and Weatherstation seemed contrived. It seemed like unnecessary indirection (though Computer Scientists do say the key to solving most computer science problems is to add another layer of indirection). But today, I found myself working on a problem where it kind of makes sense.

I have this method in my Service Layer


public bool SaveCycle(Cycle request)
   {
 bool returny;
       Cycle cycleToUpdate = cycleRepository.Get(request.Id) ?? request;
       PopulateCycleFromForm(cycleToUpdate, request);
       if (cycleToUpdate.IsValid())
       {
           cycleRepository.SaveOrUpdate(cycleToUpdate);
           returny = true;
           if (cycleToUpdate.CyclePartitions.Count == 0) // new cycles have no partitions associated. create default
           {
               CreateDefaultCyclePartitions(cycleToUpdate);
           }
       }
       else
       {
           returny = false;
       }
       return returny;
   }

and Cycle is a Domain Object in my object model. What made the Law of Demeter ring a bell was thinking about:


if (cycleToUpdate.CyclePartitions.Count == 0) // new cycles have no partitions associated. create default
           {
               CreateDefaultCyclePartitions(cycleToUpdate);
           }

I am storing CyclePartitions as a list of objects inside of Cycle in my domain layer. I need to create some default partitions on new cycle creation, so it seems reasonable to check the count of the collection.

But what if, 6 months down the line, I change the type of the CyclePartitions collection I'm using in my Cycle Domain Object to some other collection type? Maybe it exposes a Size property instead of a Count property. Then, I would have to go back and change the Service Layer as well, since I've subtly introduced a dependency here.

But, if I write this in my Service Layer:


if (cycleToUpdate.GetNumberOfCyclePartitions() == 0) // new cycles have no partitions associated. create default
           {
               CreateDefaultCyclePartitions(cycleToUpdate);
           }

Then I am free to do whatever I want inside of my Cycle Domain Object, as the Service Layer only cares about the API.

Monday, October 11, 2010

Security in Learning Systems

The security considerations of the intelligent tutoring system component of an interactive learning system are important, but not one of the critical pieces of the application. Security in a tutorial environment is not as critical as security in online credit card transactions or encrypted communications for national defense, for example. At the same time, a student may be rightfully upset if the details of their tutoring session are disclosed without their permission. An educator would also be annoyed if students were able to modify the results of their sessions to "game the system." by spoofing learning outcomes.

[hypothesis]
Within the continuum of the "ilities", security for an ITS will not be as important as usability, adaptability, maintainability, and performance. It may be as important as scalability, reliability, and testability. It falls within the third quartile of issues for an ITS.

[Idea. Create a survey with all of the "iliites" listed and send out to ITS domain experts]

Common security attacks are spoofing, denial of service, and direct applications --such as worms, viruses, trojan horses, and logic bombs--.

Spoofing is one of the types of attacks educators are most familiar with. It occurs when one party masquerades as another for the purposes of subverting security. For example, when a student Bill asks another student Tom to take a test for him. For another example, suppose teacher Mary attempts to log on to the administrative portion of the system, Tom notices her password, and later pretends to be Mary.

Spoofing can lead to incorrect interpretation of results, or it can give unauthorized users access to sensitive information. The spoofing concerns in an ITS are as follows:

Cheating -- student spoofs as another student in order to fool the system
Man-in-the-middle -- student intercepts interactions between student and system or teacher and system
Bullying -- student spoofs as teacher or system in order to fool another student
Penetration -- student spoofs as teacher in order to view/modify their own/other student's data
Destruction -- assailant spoofs as content author in order to load invalid knowledge into the knowledge base

In order to prevent spoofing, modern software applications use a variety of identity confirmation techniques. These can range from hardware and software based solutions to biometrics. These techniques are useful, but by themselves may not provide enough protection if the communication channel is not secure.
[make diagram]
The security of the communication channel can be strengthened by the use of secure standards, such as SSL. Additionally, the integrity of content can be verified through the use of checksums.

Denial of Service

Tuesday, September 28, 2010

Cross Browser Differences: File uploads

I've been working on an issue with file uploads not working properly when being passed to an ASP.NET MVC Controller in IE8. This code works fine in Firefox and Chrome, but strangeness happens in IE:


public FileUploadJsonResult ExcelImport()
       {
           FileUploadJsonResult result = new FileUploadJsonResult();
           HttpPostedFileBase hpf = Request.Files[0] as HttpPostedFileBase;
           if (hpf.ContentLength == 0)
               return new FileUploadJsonResult { Data = new { message = "File contained no data" } };
           String timeStampedFile = Path.GetFileName(hpf.FileName).Insert(hpf.FileName.IndexOf('.'),"_"+DateTime.Now.ToFileTimeUtc());
           string savedFileName = Path.Combine(AppDomain.CurrentDomain.BaseDirectory + "budgetImports", timeStampedFile);
           hpf.SaveAs(savedFileName);
           try
           {
               result = ProcessFile(savedFileName, Request["budgetId"]) as FileUploadJsonResult;
           }
           catch (ArgumentException e)
           {
               this.Response.StatusCode = 500;
               this.Response.StatusDescription = System.Net.HttpStatusCode.BadRequest.ToString();
               Response.Write(e.Message);
               result = Json(new { message = e.Message, stackTrace = e.StackTrace }) as FileUploadJsonResult; 
           }
           return result;
       }

Turns out, IE8 was sending the whole file path to the controller, as opposed to just the file name. This seems to contradict this post.

I'm working within a company intranet, so it may be a local policy. I'm also solving a fairly complicated use case using the jQuery Form plugin to do an AJAX file upload, which may play a factor. But if you run into errors with ajax file uploads seemingly working in FF and Chrome but mysteriously failing in IE, this may be why.

Thursday, September 23, 2010

Software engineering, law, and politics, rev 0.1

Modern political and legal systems have been relatively slow to adopt new technologies, considering the enormity of opportunity. Potential benefits include potential simplifications to process and workflow that would allow such systems to function more efficiently, convenience for end users, convenience for administrators, auditing and tracking capabilities, and more agile adaptation to citizen's needs.[insert sources here]

Imagine, if you will, if we could automatically parse the text of existing laws and find contradictions. These could be temporal discrepancies (e.g. law proposed in 2010 contradicts law passed in 2002,found automatically before law has chance to pass, lawsuit must be filed,etc etc) [find historical source], scope discrepancies (city law violates state or federal law) [find historical source], or even logical discrepancies [find historical source](law A implies behavior zeta is permissible but not behavior gamma, law B strengthens law A while adding clause mu, law C strikes B,keeps mu, but somehow allows gamma.)

Furthermore, such a system could proactively help "prune" legal systems by finding laws that "don't make sense", e.g. the stereotypical donkey in a bathtub [find historical source] and other archaic or arcane statutes. It could then put those measures up for vote, and citizens could proactively "remove the dross" to make legal systems leaner, easier to approach, and more receptive to the wants of the populace.

Such a system may seem far-fetched. It's not. Although the level of natural language processing necessary to evaluate some of the more semantic questions may be a little bit off, we can already do many of these things with already available tools. The existence of search engines such as Google show that although a computer can't really understand what the user is thinking,it can be leveraged as a tool to do very powerful things.

But do our political and legal systems need this type of fixing? Are current processes good enough? Most people would say no [insert sources here]. But are the potential benefits worth the potential costs?

Many people fear the proactive application of technology in the tasks related to the public space. For good reason. At it's worst, technology could be used to create a 1984-esque dystopian society that allows the ruling elite to fully monitor and collect information about everyone, which would enable elimination of dissidents and suppression of freedom. This would not be hard to do. With the rise of biometrics, RFID ,ubiquitous monitoring via omnipresent internet access and web cameras,etc, we already have the technology. [insert sources] Is it worth pursuing digitization and technological integration, when the costs could be so catastrophic?

As freedom and security are competing aims, societies that value both must make these decisions carefully. One could argue, however, that bloated systems rife with inefficiency and unnecessary complexity threaten both freedom and security. Processes that are unwieldy lead to more systemic failures. We have already seen this in software.

When software engineering first became a topic of discussion, in the 1960s[site paper that I forgot],it's main concern was the abysmal failure rate of software projects.[insert number that I forgot] Too many projects suffered from budget or scheduling overruns, if not outright failure, and better approaches were sought. Through the years, various models have been proposed.

What became the classical model is now known as "The Waterfall Model." [cite paper] In the waterfall model, a software project is partitioned into distinct phases: Requirements elicitation, Design, Implementation, Testing, Deployment,Maintenance. Each phase is to be completed fully up front, so that the software process always "flows down." Heavy documentation and record keeping is relied upon to transfer information between phases,and once a phase is complete it is to be "set in stone." It is ironic that this became the standard model, as the original author's intent was to argue against such a process. [cite source] In fact,the author argued, a better approach involved iterative refinements since it is usually impossible to get everything required up front in any but the smallest and simplest projects. Time has proven that criticism correct. In the 1970s/1980s,software engineering saw a move towards a Spiral model [cite paper], where each phase fed into each other and then looped back around at the end,iteratively, to develop quality software. This alone wasn't enough, and as the complexity of the software problem has increased the models have as well, moving to the Rational Unified Process[cite paper]and most recently Agile methodologies.[cite Agile Manifesto]

The key progression has been to "cut down red tape" in terms of heavy, unwieldy paperwork and processes that seek to "do it all in one go" and instead instill values of productivity and adaptation. It can be argued that some camps have gone too far with Agile, sacrificing discipline in the form of basic documentation and creating chaos[cite anti-Agile source], and software projects are nowhere near perfect [cite source],but Agile is clearly the direction the industry is moving in and studies suggest it is improving things.[cite source that shows improvement thanks to Agile]
This is not surprising, either. Given the rise of the internet and the growing inter-connectivity, being able to respond quickly is more important than ever before. This need will only increase with time.

One can argue that the United States' political and legal system currently follows a "waterfall" type of model. The current legal process is arduous, whether you're a defendant going to a court date to schedule another court date or a politician trying to pass an initiative. An explosion of structure has resulted in a morass of paperwork and processes that are not nimble enough to react to changing social conditions [cite source]. These processes are rife with opportunities for failure and corruption. [strengthen this a little more]

Furthermore, the products of "waterfall" style development tended to be large,monolithic pieces of software. Gigantic, complex, arcane laws are not only common, they are prevalent. For example, see The PATRIOT act [cite source],the Stimulus package [cite source], the Health Care Reform act[cite source], etc. The average citizen will not read a thousand page bill with hundreds of provisions, which are often written in difficult to decipher language(by design). These bills themselves are often littered with extraneous provisions that have nothing to do with the original intent of the bill, labeled "riders" or "pork." For examples, see the Safe Port Act-- which on an unrelated note crippled online gambling-- [cite source] and the recently failed Defense appropriations bill that included provisions to end "Don't Ask,Don't Tell"and the DREAM Act.[cite source]

This drives a software engineer crazy. Knowing basic concepts of how to represent things (data structures) and to get things done (algorithms),software engineers then learn the principles of effective systems design. An experienced engineer could describe the average current bill as one with low cohesion[describe cohesion]and tight coupling[describe coupling]. The process by which lawmakers create, inspect, and debate bills could be described as inefficient resource allocation that leads to deadlock [describe deadlock] and starvation[describe starvation].

But how did the system get this way? A good historical background can easily answer this question: by design and by the tension of conflicting forces.

The Federal Papers[cite source] and other assorted musings from the Founding Fathers, primarily the works of Madison and Jefferson, reveal a distrust of strong governmental structures. Checks and Balances were established for the very purpose of governmental inefficiency, with the thought being that an inefficient government that does nothing is still better than an efficient one that tyrannizes it's people.[cite source] This is not surprising, given their historical context and rationale for the Revolution, and can still be argued to be true given the even greater capacity for tyranny in the modern context.

The Jeffersonian argument favors a small, limited federal governmental structure that respects the rights of it's citizens to such esteem that it does little itself, delegates responsibility to the states for most actions, and presides over a limited and fairly specific set of cross-cutting concerns like international trade and militaristic mobilization. [cite sources. Possible paper idea: system architectural lessons learned from a historical analysis of governmental structures] Ironic, then, that Jefferson himself signed the Louisiana Purchase--the largest expansion of federal power and land mass in American history[http://americanhistory.about.com/od/thomasjefferson/a/tj_lapurchase.htm].

This may be due to influences from a competing system design, the Hamiltonian argument. Hamilton, seeing the potential for US expansion and growth, favored a stronger centralized federal government structure that handled a larger set of cross-cutting concerns to establish and grow the ability of the government including banking and cultivation of industry. A Hamiltonian style of design necessarily requires greater bureaucracy in order to do the "book-keeping" that handles the interconnected details of a more powerful system. Consequently he argued for an expansion of federal power.

The conflicting nature of these two ideals leading to lack of cohesion within the system, which has oscillated in different directions over the course of the years, combined with an original emphasis on balanced power, make it unsurprising that the system is difficult to modify, difficult to maintain, and difficult to evolve.

But as we've established, it is possible, advantageous, and ultimately necessary for systems to adapt and be modified in order to meet changing requirements. In the software world, we call this refactoring.[explain refactoring. Source] Systems that do not evolve become crushed by what Grady Booch terms "inertia" [source Podcast "On Architecture"] and "code rot."

The system must evolve, if it is to survive. But it must do so in carefully measured steps that follow logically and can be tested for equivalent fulfillment of requirements, if it is not to fall apart. Software can help meet these needs, but in order to do so it must ensure the fundamental pillars of the underlying political system

Participation -- Citizens must have the ability to make their voices heard
Collaboration -- Citizens must be able to unite and act in concert
Security -- Citizens must be protected from misuses of power,or attacks on the system from external forces
Conflict Resolution -- Citizens must be able to disagree lawfully without fear of reprisal
Reliability -- Citizens must be able to trust in the process

Transparency -- Citizens must be able to trust the accuracy of results via verification

[Ted talk http://www.youtube.com/TED#p/u/1/izddjAp_N4I]

[brainstorm more. Try to find supporting examples]

Luckily, software can do this. In fact,in many application domains--such as defense, avionics,and medical applications-- it makes guarantees similar to these and stronger. Consequently,it is not a question of whether software intensive systems can fulfill these requirements, but rather how they are to be composed.

Take the issue of voting, for example. Many electronic voting systems have been proposed,implemented,and analyzed since the 1960s, and this application area has received even more attention with the rise of the internet.[http://en.wikipedia.org/wiki/Electronic_voting] The reason is simple, voter turnout is relatively low on local,state, and national elections.[cite source] It is not nearly as low on online surveys and Facebook polls [cite source]. The most common complaint about the voting process as it stands today,whether it be in person at the polls or via mail, is inconvenience and ease of forgetting. Some would argue the civic duty is virtuous because it is not convenient and requires energy, but that is a philosophical/design discussion that doesn't mirror the reality of the general public's dissatisfaction that Washington isn't "hearing our voices." [cite source]

A system that could enable more accurate, secure,and reliable near real-time response in the form of votes to pressing issues for politicians from constituents could dramatically change the way discussions are framed. Such a system could be sufficiently easy and convenient for a voter to use, especially in these days of ubiquitous connectivity via web interfaces and mobile devices. This system could increase participation and involvement and offer both law-makers and citizens never before seen guarantees of authenticity, accuracy, and transparency.

People are nervous about implementing such a system for one major reason: security. There are many security considerations for such a system, including:
-- Anonymity : how can the system guarantee the safety of the voter from reprisal while logging the voter's vote for both tally and content
-- Robustness: how can the system handle usage without failing so votes aren't "lost",aka denial of service [explain denial-of-service]
-- Security: how can the system secure itself against man-in-the-middle attacks [explain man in the middle], spoofing [explain spoofing], playback [explain playback attacks]
-- Integrity: how can the system ensure logical constraints such as no voter voting twice, all votes being equal, all voters being authorized and authenticated [explain authorization and authentication]?
-- Correctness: how can the system ensure that the correct vote is cast? How can a voter verify their vote without anyone else being able to? How can an auditor tell that a person has voted,without being able to see what the vote was?
[list more. Read a few papers]

...This seems like a hard problem space. In many ways, it is. Yet it's not impossible. Many of these same constraints are faced every day in the processing of credit card transactions, biomedical data, and entertainment sources[see DRM] and other types of data. The potential failings of a system could be catastrophic if implemented incorrectly, but because the system is inherently a public good there is actually higher probability of it being correct [cite bug count analysis in terms of number and fixes of open source versus proprietary software of comparable size and functionality].

Let us propose, at a high level, the implementation of such a system.

Since the system is a public good and must be verifiable such that it inspires the confidence of it's citizens, an almost immediate initial requirement is that the system must be open source. Rather than having to rely on "unbiased mediators" (who may or may not be unbiased), citizens should be able to download the software and verify it's operation themselves. They should be able to verify that their is no logic that compromises their identity, discards their vote,or incorrectly tabulates the results. Luckily, this can be done fairly simply through the use of cryptographic hashing.

The software can be even more transparent and secure with good logging and audit output that echoes every line of code as it's about to be run,such that anyone who can read the language can analyze it.If the software is written in a good Domain Specific Language (which itself has a published implementation detail) then the code output can be readable to almost anyone.

In order to secure voter identity,a public/private key mechanism could be used with a small scale TPM stored on a person's driver's license.Perhaps integrate RFID. Or steganography with a 3D barcode. [elaborate]

Anonymity and confidentiality can be assured by creating an SSL tunnel between a citizen's computer and a web server. Perhaps using a Tor-style anonymous network to connect to the Front Controller that contains the public key of registered citizens such that it can provide authentication and verification before allowing access to a back end data structure that generates a new public/private key combo for the citizen when actually casting a vote that is used solely for correctness verification purposes. [elaborate more]

With such an architecture,we can provide all of the benefits of current voting systems while creating an even more transparent and auditable process.

[write strong conclusion]

Thursday, July 8, 2010

Pair Programming Coding Horror

Yesterday I had the pleasant experience of pair programming to "learn the code" from a lead developer. I use the term "pleasant" loosely. It's pretty hard when you're the intern and you see the things your "experienced mentor" are wrong. This becomes especialy trying when you try to call best practices to his attention and he uses excuses to excuse personal coding laziness such as:

"the customer wants it this way" (chances are he doesn't. the customer usually wants something that works, and doesn't care about the implementation detaills.)

"this works" (yeah, but how robust is it? secure? maintainable? modifiable?)

"I don't know what this does and I don't wanna deal with it." (when you're the lead developer on a project, this is a bad mentality.)

At one point I wanted to scream out "DON'T REPEAT YOURSELF!" and "DON'T LIVE WITH BROKEN WINDOWS!", but my good sense got the best of me. Still, I felt like William Shakespeare learning to write from John Grisham. At points I wanted to yell out "You knave! You scoundrel!"

Wednesday, May 26, 2010

The horror, oh the horror

I thought I'd seen horrible code before...then I had to figure out what happened that caused this working monstrosity to stop working when the site it was on was ported from Drupal 5 to Drupal 6. I think I'm going to lose a few hairs on this one.

Sunday, May 16, 2010

Groovy: duck-typing not always the best choice?

Working through Grails In Action chapters 3 & 4, I noticed something. The integration tests written to test domain classes and controllers use Groovy defs, presumably for simplicity.

The test cases look a lot like this:

void testFirstSaveEver() {

 def user = new User(userId:'joe', password:'secret')
    assertNotNull user.save() 
    assertNotNull user.id
def foundUser = User.get(user.id)     
assertEquals 'joe', foundUser.userId

}

I'm not going to go too deeply into the whole dynamic versus static typing debate, though I do have some verbose opinions on the matter. I was particularly intrigued with Groovy-- after learning Python and PHP and coming from a C++/Java/.NET background-- because it seems to be at the sweet spot of statically typed and dynamically typed. Being able to do both is a powerful thing.

For one, the type can serve as a form of documentation. Anyone looking at the code can tell what the original developer is expecting to receive when the type is declared, which may at times be easier/clearer/more telling than evaluating the expression on the right.

Additionally, I would expect that declaring the type at compile time would enable the Groovy interpreter to do some fancy things. Interpreted languages with dynamic typing get a bad wrap because they're slow. This is for a reason, as the duck typing resolution is not a free operation. I figure that if you can help the process along by telling the interpreter what the type is, when you know what the type should be, it should speed up the interpretation.

Curious, I decided to start writing the tests presented in the book, and tests that had statically typed variables instead of def. Like so:


void testFollowing(){
   def before = System.currentTimeMillis()
   def glen = new User(userId:'glen', password:'password').save()
   def peter = new User(userId:'peter', password:'password').save()
   def sven = new User(userId:'sven', password:'password').save()

   glen.addToFollowing(peter)
   glen.addToFollowing(sven)
   assertEquals 2, glen.following.size()
   sven.addToFollowing(peter)
   assertEquals 1, sven.following.size()
   def after = System.currentTimeMillis()
   println "User following using duck typing: " + (after-before)
}

void testTypedFollowing(){
   long before = System.currentTimeMillis()
   User glen = new User(userId:'glen', password:'password').save()
   User peter = new User(userId:'peter', password:'password').save()
   User sven = new User(userId:'sven', password:'password').save()

   glen.addToFollowing(peter)
   glen.addToFollowing(sven)
   assertEquals 2, glen.following.size()
   sven.addToFollowing(peter)
   assertEquals 1, sven.following.size()
   long after = System.currentTimeMillis()
   println "User following using static typing: " + (after-before)
}

Through the end of chapter 4, I've written quite a few of these tests. The output is shown below

Unit Test Results.

Designed for use with JUnit and Ant.

All Tests

Name	Time(s)
PostIntegrationTests	testFirstPost	0.188
PostIntegrationTests	testAccessingPosts	0.078
PostIntegrationTests	testPostWithTags	0.069
PostIntegrationTests	testTypedPostWithTags	0.050
QueryIntegrationTests	testBasicDynamicFinders	0.248
QueryIntegrationTests	testTypedBasicDynamicFinders	0.051
QueryIntegrationTests	testQueryByExample	0.052
QueryIntegrationTests	testTypedQueryByExample	0.040
TagIntegrationTests	testSomething	0.016
UserIntegrationTests	testBlankUserName	1.518
UserIntegrationTests	testFollowing	0.055
UserIntegrationTests	testTypedFollowing	0.033
UserIntegrationTests	testFirstSaveEver	0.022
UserIntegrationTests	testSaveAndUpdate	0.022
UserIntegrationTests	testSaveThenDelete	0.101
UserIntegrationTests	testEvilSave	0.046
UserIntegrationTests	testSaveEvilCorrected	0.029

In every comparison of typed versus def, the typed version has been faster. To verify, I also look at what I printed out to console:

--Output from testBasicDynamicFinders--
Duck typing testing dynamic finders: 230}
--Output from testTypedBasicDynamicFinders--
Typed testing dynamic finders: 41}
--Output from testQueryByExample--
Duck typing testing query by example: 42}
--Output from testTypedQueryByExample--
Static typing testing query by example: 30}

I guess that's kind of pointless. I just verify that JUnit tells time. :) The point being, however, that there seems to strong evidence that using the type when possible dramatically improves performance. The testBasicDynamicFinders method saw an almost 5x reduction in time of execution. This seems to completely contradict John Wilson's assertions that "Knowing the type of a parameter makes the call slower!" Of course, it was 4 years ago, but it looks like there was a good reason for blackdrag to investigate!

The question becomes: are there times when declaring the type is slower than def? Are these results valid, or is my system just nuts? If they are, should *hint hint* these portions of the book be rewritten in the next edition to spread best practice?

Software Meditations