Software Meditations
Saturday, April 21, 2018
On preferring the use of getters over direct field access
You often find advice that says you should prefer getters over directly using a field. Fowler terms this SelfEncapsulation.
I've sometimes thought this is a bit abstract and a case of over-engineering...but today I found a good reason why it's necessary.
Consider the following code:
It turns out this NPE happens because of the implementation of java.io.File#isInvalid.
Because this uses this.path instead of getPath(), even providing a canned answer for an external mock doesn't work well. Since #isInvalid is final (but according to the JavaDoc has rudimentary logic that they probably intend to evolve).
Encapsulating even your use of private fields behind getters provides unexpected hooks for testability and maintainability. 
 
Tuesday, February 20, 2018
Using ECMAScript 6 modules
This may be obvious to everyone else, but today I found myself constructing a toy HTML/JS example and wanting to use the new hotness of ECMAScript 6 Modules.
I constructed the following page.
This refused to work. I kept getting
ReferenceError: greetCustomer is not defined
Reading the docs a little more, I decided to change it. I changed my script tag to type module and removed the onready function bit. No joy.
I tried importing directly, using an example I'd seen. Now I got:
SyntaxError: import not found: getUserPreferredLanguage
I finally got things working with the following
It seems like importing modules via the destructuring syntax by default is not the way to go.Sunday, March 27, 2016
Basic Data Analysis: How "RealClearPolitics" obscures data and may be too incompetent to exist
I was trolling around the Internet when someone linked me to the RealClearPolitics frequency count of the Democratic Popular Vote. Here's the data as it was being presented, as I would not be surprised if it were to change:
People were using this "data" to support the assertion that Hillary Clinton has "more of the popular vote" than Bernie Sanders. That really seems to defy intuition, given the record crowds Bernie draws to his speeches. But really. Look at that data for the last 3 rows. Does it make any sense at all?
2016 Democratic Popular Vote
5.1k Shares
GOP Delegate Count, Map | GOP Popular Vote | Dem Delegate Count, Map | Dem Popular Vote | Latest 2016 Polls
| State | Date | Clinton | Sanders | Spread | ||||||||||||||||||||||||
| RCP Total | - | 8,924,920 | 6,398,420 | Clinton +2,526,500 | ||||||||||||||||||||||||
| Iowa | February 1 | |||||||||||||||||||||||||||
| New Hampshire | February 9 | 95,252 | 151,584 | Sanders +56,332 | ||||||||||||||||||||||||
| Nevada | February 20 | |||||||||||||||||||||||||||
| South Carolina | February 27 | 271,514 | 95,977 | Clinton +175,537 | ||||||||||||||||||||||||
| Alabama | March 1 | 309,928 | 76,399 | Clinton +233,529 | ||||||||||||||||||||||||
| American Samoa | March 1 | |||||||||||||||||||||||||||
| Arkansas | March 1 | 144,580 | 64,868 | Clinton +79,712 | ||||||||||||||||||||||||
| Colorado | March 1 | 49,314 | 72,115 | Sanders +22,801 | ||||||||||||||||||||||||
| Democrats Abroad | March 1-8 | |||||||||||||||||||||||||||
| Georgia | March 1 | 543,008 | 214,332 | Clinton +328,676 | ||||||||||||||||||||||||
| Massachusetts | March 1 | 603,784 | 586,716 | Clinton +17,068 | ||||||||||||||||||||||||
| Minnesota | March 1 | 73,510 | 118,135 | Sanders +44,625 | ||||||||||||||||||||||||
| Oklahoma | March 1 | 139,338 | 174,054 | Sanders +34,716 | ||||||||||||||||||||||||
| Tennessee | March 1 | 245,304 | 120,333 | Clinton +124,971 | ||||||||||||||||||||||||
| Texas | March 1 | 935,080 | 475,561 | Clinton +459,519 | ||||||||||||||||||||||||
| Vermont | March 1 | 18,335 | 115,863 | Sanders +97,528 | ||||||||||||||||||||||||
| Virginia | March 1 | 503,358 | 275,507 | Clinton +227,851 | ||||||||||||||||||||||||
| Louisiana | March 5 | 221,615 | 72,240 | Clinton +149,375 | ||||||||||||||||||||||||
| Nebraska | March 5 | 14,340 | 19,120 | Sanders +4,780 | ||||||||||||||||||||||||
| Kansas | March 5 | 12,593 | 26,450 | Sanders +13,857 | ||||||||||||||||||||||||
| Maine | March 6 | |||||||||||||||||||||||||||
| Mississippi | March 8 | 182,447 | 36,348 | Clinton +146,099 | ||||||||||||||||||||||||
| Michigan | March 8 | 576,795 | 595,222 | Sanders +18,427 | ||||||||||||||||||||||||
| Northern Marianas | March 12 | |||||||||||||||||||||||||||
| Florida | March 15 | 1,097,400 | 566,603 | Clinton +530,797 | ||||||||||||||||||||||||
| Illinois | March 15 | 1,007,382 | 971,555 | Clinton +35,827 | ||||||||||||||||||||||||
| Missouri | March 15 | 310,602 | 309,071 | Clinton +1,531 | ||||||||||||||||||||||||
| North Carolina | March 15 | 616,383 | 460,316 | Clinton +156,067 | ||||||||||||||||||||||||
| Ohio | March 15 | 679,266 | 513,549 | Clinton +165,717 | ||||||||||||||||||||||||
| Arizona | March 22 | 235,697 | 163,400 | Clinton +72,297 | ||||||||||||||||||||||||
| Idaho | March 22 | 5,065 | 18,640 | Sanders +13,575 | ||||||||||||||||||||||||
| Utah | March 22 | 15,666 | 61,333 | Sanders +45,667 | ||||||||||||||||||||||||
| Alaska | March 26 | 99 | 440 | Sanders +341 | ||||||||||||||||||||||||
| Hawaii | March 26 | 10,125 | 23,530 | Sanders +13,405 | ||||||||||||||||||||||||
| Washington | March 26 | 7,140 | 19,159 | Sanders +12,019 | ||||||||||||||||||||||||
Only 500 people showed up to vote in AK? Really?
But its population is about 750K...HI has a population of 1.3M.WA has a population of 7M.
Yet HI had more voters than WA?
Take WA: Even if you assume only 30% of the population is eligible to vote (that's a lot of kids and foreigners!), only 50% of those people are Democrats (which would be low for The Left Coast), and only 25% of Democrats show up for the primaries (because people in WA don't care about politics..FALSE), that would be... 262K people.
What WA *actually* reported are _delegate numbers_. In one precinct in Washington, 149 people showed up. 8 delegates were awarded. That means the ratio of reported delegates to actual participants was 18:1.
If this scale alone were appropriate, the actual popular vote would be about 464K.
This chart is completely invalid. It CLEARLY has a BIG CATEGORICAL ERROR. It compares apples and oranges. It takes only a little common sense to realize this.
I can only attribute such a dramatic failure of data analysis 101 that one would think a high school student in a reasonable education system would be able to catch to either maliciousness or complete and utter incompetence. If I were a "tinfoil hat" wearing person, I would say this data is purposefully being misrepresented so it can be misconstrued...
But I'm a charitable person. So instead, I'll assume it's complete and utter incompetence by "RealClearPolitics" that obscures more than it helps. I might be wild and say this kind of dramatic failure should cause us to immediately discount all of their data as lacking basic data sense, but maybe this particular table is "just the intern."
But isn't it funny how fundamental errors of basic analysis in these types of "canonical data sources" can spread misinformation like a plague throughout the Internet?
Don't forget to question the data, and the methods used to collect and analyze it.
But its population is about 750K...HI has a population of 1.3M.WA has a population of 7M.
Yet HI had more voters than WA?
Take WA: Even if you assume only 30% of the population is eligible to vote (that's a lot of kids and foreigners!), only 50% of those people are Democrats (which would be low for The Left Coast), and only 25% of Democrats show up for the primaries (because people in WA don't care about politics..FALSE), that would be... 262K people.
What WA *actually* reported are _delegate numbers_. In one precinct in Washington, 149 people showed up. 8 delegates were awarded. That means the ratio of reported delegates to actual participants was 18:1.
If this scale alone were appropriate, the actual popular vote would be about 464K.
This chart is completely invalid. It CLEARLY has a BIG CATEGORICAL ERROR. It compares apples and oranges. It takes only a little common sense to realize this.
I can only attribute such a dramatic failure of data analysis 101 that one would think a high school student in a reasonable education system would be able to catch to either maliciousness or complete and utter incompetence. If I were a "tinfoil hat" wearing person, I would say this data is purposefully being misrepresented so it can be misconstrued...
But I'm a charitable person. So instead, I'll assume it's complete and utter incompetence by "RealClearPolitics" that obscures more than it helps. I might be wild and say this kind of dramatic failure should cause us to immediately discount all of their data as lacking basic data sense, but maybe this particular table is "just the intern."
But isn't it funny how fundamental errors of basic analysis in these types of "canonical data sources" can spread misinformation like a plague throughout the Internet?
Don't forget to question the data, and the methods used to collect and analyze it.
Friday, October 2, 2015
Java Concurrency in Practice: Oh Shit.
Holy fucking shit. Reading Java: Concurrency in Practice has made me realize that probably every fucking program I’ve ever written is wrong, the world is constantly on fire, and you can’t have people writing reasonable applications with these barbaric and primitive tools.
Rich Hickey was a motherfucking genius. Functional programming languages and immutability almost everywhere are the only answers to any of this.  Motherfucking semaphores and locks and synchronized blocks: are you kidding me? We HAVE to evolve to Actors, Dataflows, Communicating Sequential Processes, and STM. Trying to use even a Java 7 ExecutorService with a ConcurrentHashMap sanely is like being a fucking surgeon in the 1800s trying to cure a patient’s “hysteria" and "bad humors” with fucking leeches and bone saws.  
This is too fucking hard! You cannot give a 22 year old kid who barely knows his elbow from a blow job access to an ExecutorService and a random Java class he wrote in school and say “Go to town, scrappy!” Kid’s gonna have no fucking clue about the Java Memory Model, properly using synchronized blocks for guarding invariant impacts by independent updates to immutable local state, or any of the dozens of abstruse and mundane nuances that go into managing the shared cross-product combinatorial explosion of state space involved in doing any seemingly simple, trivial operation in a concurrent environment!
I love this Brian Goetz, I really do. He’s like a cheery little droid while my spaceship’s on fire, hurtling towards a black hole, happily telling me that if I just design my classes this way and use special annotations and volatile variables that way and I am really careful every time I put some Object into a Map AND get some Object out of it AND that when I return it I publish some kind of contract that my method’s consumer is supposed to follow when accessing that Object’s state, THEN I can be effectively safe in a multi-threaded concurrent environment—without even understanding that all these preconditions mean GAME OVER, MAN! The jig is up! The whole thing is fucked!!! You can’t even do…while do is not try, if you catch my drift, because I forgot to call incrementAndGet() on some AtomicLong and now I’m deadlocked.
Tuesday, March 10, 2015
Good OOP == Good Dependency Management
Object-Oriented Programming is all about dependency management. Dependency management is crucial for building evolving systems that survive and adapt to contextual changes.
It's a paradigm that scales. Even the new "container" based virtualization technologies such as Docker are basically treat "Operating System" as an Object. Even Javascript's "Revealing Module Pattern" is just making a namespace of bound functions...at some point when you're passing the same 3 parameters to 5 functions you start thinking "wait. What if I just passed those parameters to 1 function that called those 5?" and you're on your way to OOP.
Good OOP is about seeing the relationships between things and figuring out how things actually come together. Your software is a jigsaw puzzle, a Lego set. The principles of good software construction all come down to good dependency management.
When you see the right relationships, you minimize the dependency between unrelated things, form logical partitions, and find traversals through the object graph that you completely replace at sub-graphs without any loss of generality.
When you don't, you create bloated, bureaucratic, hard to read and understand systems that don't work. Hopefully at least they fail in obvious ways. It's far more difficult to fix when failures aren't obvious.
It's a paradigm that scales. Even the new "container" based virtualization technologies such as Docker are basically treat "Operating System" as an Object. Even Javascript's "Revealing Module Pattern" is just making a namespace of bound functions...at some point when you're passing the same 3 parameters to 5 functions you start thinking "wait. What if I just passed those parameters to 1 function that called those 5?" and you're on your way to OOP.
Good OOP is about seeing the relationships between things and figuring out how things actually come together. Your software is a jigsaw puzzle, a Lego set. The principles of good software construction all come down to good dependency management.
When you see the right relationships, you minimize the dependency between unrelated things, form logical partitions, and find traversals through the object graph that you completely replace at sub-graphs without any loss of generality.
When you don't, you create bloated, bureaucratic, hard to read and understand systems that don't work. Hopefully at least they fail in obvious ways. It's far more difficult to fix when failures aren't obvious.
Thursday, January 10, 2013
Why I like the ternary operator
My friend Mike (over at CodeAwesome) and I have a long standing good-natured debate about the nature of the ternary operator.
For the uninitiated, the ternary operator is basically syntax sugar in the C family of languages (and Java, since Java is Just A Fancy C++ VM) that allows you to inline if statements.
Basically:
var foo;
if(getGodlyGlobal(me.margaret)) {
foo = "baz"
} else {
foo = "wtf"
}
Becomes:
var foo = getGodlyGlobal(me.margaret) ? "baz" : "wtf";
To me, this feels concise and elegant. But to Mike, it seems to go dangerously in the direction of Perl line noise. I'm sympathetic.
Robert Martin bring up the excellent point, in his Tour de Force Clean Code, that we should Avoid Mental Mapping. He uses this concept specifically with regard to variable/function nomenclature, but it has larger implications in software engineering as well. This concept jibes well with research literature of Cognitive Load Theory in psychology.
Essentially, the more "noise" we create for our brains by having to map compressed pieces of information to larger meaning, the more difficult it is to understand what's going on. This may be why, to many, elegant models for statistics/quantum dynamics just look like "alphabet soup." Too much information compressed into a tiny space requires a lot of outside context in order to make heads or tails of.
Here, I have to know the syntax of the ternary operator, specifically that I replace the if with "?" and the else with ":". This mapping doesn't feel too complex, but any time I have to stop and stare quizzically at a piece of code, I have an opportunity to misunderstand or wasting time I could be using to develop new features.
Some languages, such as Coffeescript, elegantly handle this issue by allowing inline if/else in the evaluation of expressions. That seems reasonable. It obviously biases a programming language (with roots based in math) towards English, but then again, it's not a lot of English. If we embrace Donald Knuth's Literate Programming and try to make our code as expressive as possible, the presence of words is helpful.
Typically I use the ternary operator in a situation like the preceding, where I would just set a variable. But I just experienced another interesting use case, which is probably in line with the thoughts of the Anti-If campaign. Consider the following.
https://gist.github.com/4507897#file-buildcontext-java
This method doesn't look too complicated. It's only 13 lines, though we can clearly notice that despite being named "buildContext" its only doing anything related to a context in the last line (another method call). The rest is actually reading values from a RequestContext. A problem with naming? Sure. But also take a look at lines 40-47.
My first instinct, on looking at this, was to refactor that into a function. It's a fairly self-contained closure that's essentially assigning a variable. Looking at it further, we can see both blocks are setting the same attribute on the request, with a different value in each branch.
Aha. A violation of DRY. There's one argument.
More interestingly, there's the cognitive notion of branching. Because I have a branch in logic here, I have to look both places to determine what will happen. Ultimately, the same thing will happen, just slightly different. But it's easy to imagine this changing, isn't it?
What if a 2 am programming emergency where to come through on a recognition problem? Maybe, if something is recognized, it should be logged. 3 Months down the line, someone decides unrecognized pieces should be logged as well, but recognized pieces should calculate value. In 6 months, the whole RequestContext is replaced with some other logic.
In that kind of code churn, it's very possible to misplace a line or two. Especially something like setting the "pageIdent" attribute in one branch, but not in the other. Perhaps the other branch was refactored into a method, then the method was tweaked, and somewhere along the way the line got lost.
Branches in logic, by definition, attract Change. They invite bifurcation and inevitably confusion. One could argue that this problem could easily be removed by moving the request.setAttribute() outside of the branches...but that's the point. Exactly the point!
In programming, since Programming is Life, it's easy for things to fall into the wrong scope. Some naive developer placed duplication in those scopes without understanding that an implicit invariant was in place: no matter which branch of the if is taken, the "pageIdent" attribute should be set on the request. It can be easy to lose the forest for the trees, especially if you're not following Uncle Bob's First Rule of Functions and have hundreds of lines and multiple nested blocks.
Blocks are cognitive magnets for confusion.
Now consider this snippet:
https://gist.github.com/4507897#file-improvedbuildcontext-java
Straightforward. One can argue I cheated a little bit by refactoring the RequestContext parsing and Validation into super class's method, but even if we add those lines in, we see we've eliminated a branch in logic. Now it's perfectly clear: we're setting parameters on the request, and one particular parameter has two possible values. If we want to change what happens in each branch with this, the optimally logical thing to do is to extract that line out into a method and expand it out into its full if form. But in so doing, we've created an isolation point where such change can be processed easily without influencing surrounding logic. By eliminating a branch, we've manifested our invariant more clearly.
For the uninitiated, the ternary operator is basically syntax sugar in the C family of languages (and Java, since Java is Just A Fancy C++ VM) that allows you to inline if statements.
Basically:
var foo;
if(getGodlyGlobal(me.margaret)) {
foo = "baz"
} else {
foo = "wtf"
}
Becomes:
var foo = getGodlyGlobal(me.margaret) ? "baz" : "wtf";
To me, this feels concise and elegant. But to Mike, it seems to go dangerously in the direction of Perl line noise. I'm sympathetic.
Robert Martin bring up the excellent point, in his Tour de Force Clean Code, that we should Avoid Mental Mapping. He uses this concept specifically with regard to variable/function nomenclature, but it has larger implications in software engineering as well. This concept jibes well with research literature of Cognitive Load Theory in psychology.
Essentially, the more "noise" we create for our brains by having to map compressed pieces of information to larger meaning, the more difficult it is to understand what's going on. This may be why, to many, elegant models for statistics/quantum dynamics just look like "alphabet soup." Too much information compressed into a tiny space requires a lot of outside context in order to make heads or tails of.
Here, I have to know the syntax of the ternary operator, specifically that I replace the if with "?" and the else with ":". This mapping doesn't feel too complex, but any time I have to stop and stare quizzically at a piece of code, I have an opportunity to misunderstand or wasting time I could be using to develop new features.
Some languages, such as Coffeescript, elegantly handle this issue by allowing inline if/else in the evaluation of expressions. That seems reasonable. It obviously biases a programming language (with roots based in math) towards English, but then again, it's not a lot of English. If we embrace Donald Knuth's Literate Programming and try to make our code as expressive as possible, the presence of words is helpful.
Typically I use the ternary operator in a situation like the preceding, where I would just set a variable. But I just experienced another interesting use case, which is probably in line with the thoughts of the Anti-If campaign. Consider the following.
https://gist.github.com/4507897#file-buildcontext-java
This method doesn't look too complicated. It's only 13 lines, though we can clearly notice that despite being named "buildContext" its only doing anything related to a context in the last line (another method call). The rest is actually reading values from a RequestContext. A problem with naming? Sure. But also take a look at lines 40-47.
My first instinct, on looking at this, was to refactor that into a function. It's a fairly self-contained closure that's essentially assigning a variable. Looking at it further, we can see both blocks are setting the same attribute on the request, with a different value in each branch.
Aha. A violation of DRY. There's one argument.
More interestingly, there's the cognitive notion of branching. Because I have a branch in logic here, I have to look both places to determine what will happen. Ultimately, the same thing will happen, just slightly different. But it's easy to imagine this changing, isn't it?
What if a 2 am programming emergency where to come through on a recognition problem? Maybe, if something is recognized, it should be logged. 3 Months down the line, someone decides unrecognized pieces should be logged as well, but recognized pieces should calculate value. In 6 months, the whole RequestContext is replaced with some other logic.
In that kind of code churn, it's very possible to misplace a line or two. Especially something like setting the "pageIdent" attribute in one branch, but not in the other. Perhaps the other branch was refactored into a method, then the method was tweaked, and somewhere along the way the line got lost.
Branches in logic, by definition, attract Change. They invite bifurcation and inevitably confusion. One could argue that this problem could easily be removed by moving the request.setAttribute() outside of the branches...but that's the point. Exactly the point!
In programming, since Programming is Life, it's easy for things to fall into the wrong scope. Some naive developer placed duplication in those scopes without understanding that an implicit invariant was in place: no matter which branch of the if is taken, the "pageIdent" attribute should be set on the request. It can be easy to lose the forest for the trees, especially if you're not following Uncle Bob's First Rule of Functions and have hundreds of lines and multiple nested blocks.
Blocks are cognitive magnets for confusion.
Now consider this snippet:
https://gist.github.com/4507897#file-improvedbuildcontext-java
Straightforward. One can argue I cheated a little bit by refactoring the RequestContext parsing and Validation into super class's method, but even if we add those lines in, we see we've eliminated a branch in logic. Now it's perfectly clear: we're setting parameters on the request, and one particular parameter has two possible values. If we want to change what happens in each branch with this, the optimally logical thing to do is to extract that line out into a method and expand it out into its full if form. But in so doing, we've created an isolation point where such change can be processed easily without influencing surrounding logic. By eliminating a branch, we've manifested our invariant more clearly.
Monday, December 31, 2012
Questions from the Mailbag: On Static Imports
"Something I noticed today while I was looking over the Mockito API...they suggest importing the library statically. So I did, and I noticed JUnit was imported statically as well in Eclipse, but other classes/libraries in my Unit Test class were not. What I'm asking then is why do the static import, why does it seem to be important in testing, and when might one want to do it otherwise, if at all?"
Excellent question!
Static imports are basically used in Java to allow C/C++ style programming. I fucking hate them. Essentially, they are a feature that was added because programmers are lazy and hate to type. Rather than referencing the class a static method is pulled from, with a static import you can save yourself a few characters.
The argument against them is that they pollute your namespace and are not explicit. 
Polluting your namespace means that, if you have a method with the same name as a statically imported method, the compiler will get confused. This may not be a problem, depending on the class in question. For example, org.junit.Assert's assertEquals() is unlikely to appear in many places. But if you were writing the class OpinionatedCalculator, you may have a problem.
The bigger problem here is that namespace collision is obviously more likely with classes that have more methods. If I statically import something like org.apache.commons.lang.StringUtils  I'm more vulnerable to running into problems with a common word like "contains()."
One could argue this is because StringUtils is too large: in theory a static import on a small class, with relatively few methods (say 3) is "relatively safe."
I say fuck that theory. It's not that much more work to type Assert.assertEquals() than assertEquals(), and it saves me from having to look up at the top of the file to figure out where some magical global method comes from. In that sense, I say it's much more "explicit": The code is obvious the second you look at it. 
You know I like my code like I like my porn:
Explicit is Better Than Implicit.
http://www.python.org/dev/peps/pep-0020/
You know I like my code like I like my porn:
Explicit is Better Than Implicit.
http://www.python.org/dev/peps/pep-0020/
That said, if you browse through enough of my code on github, you'll find instances where I statically import JUnit. No one's perfect. Sometimes I flip on the opinion based on how lazy I feel like being that day, because I have the option to. Hence why I say it's a bad language feature, and I shouldn't have the option to.
Of course, opponents to that claim would say that code is longer and hence uglier with extra noise words. If the class name is unnecessary and it's clear that the static import makes sense, favor brevity. But if there's one thing you should know about me by now, it's that I like to go on...and on...and on.
Fuck static imports.
Subscribe to:
Comments (Atom)
