Sunday, January 30, 2011

Why Political Informatics is necessary, and the value of open standards

On 2/7/2011, Maricopa county will have a tax lien online action at http://www.bidmaricopa.com. This auction will serve to generate money for the county by giving private investors a chance to buy relatively low-risk investments via a public auction. Arizona has passed laws to make all of this information, including the properties info, available via "public notices."

So they've gone ahead and published the data in a newspaper, and also "posted it on the web."

I use quotations because their definition of "posting on the web" is very loose:

http://www.publicnoticeads.com/AZFRAME/search/view.asp?T=PN&id=14\1182011_15476677.HTM
http://www.publicnoticeads.com/AZFRAME/search/view.asp?T=PN&id=14\1182011_15476677.HTM

Utterly repugnant. The "disclaimer" at the top suggests: "NOTE: Some notices are extracted from PDF files and may be difficult to read."

...which is some jerky's way of playing Cover Your Ass because they couldn't be bothered to list the information in a way that might make it usable to the public. This follows the letter of the law, but certainly not the intent. In order to use this "posting", it is up to individuals to read through a wall of text and try to figure out how many records are in there, what all the information corresponds to, and how to use that information to make an informed decision.

"Oh, you want some basic questions answered on the different types of items available? Here's a 500 page dump of garbage that has it all in there somewhere. In theory. Have fun."

Appalling.

Luckily, as a Computer Scientist I'm able to read the header and recognize something:

"THE INFORMATION FOR EACH PARCEL ADVERTISED HEREIN IS PRINTED IN THE FOLLOWING ORDER: ONE UP SEQUENCE NUMBER, WHICH WILL BE USED AS AN IDENTIFIER FOR THE PURPOSE OF ADMINISTERING THE SALE; TAX AREA CODE; PARCEL NUMBER; OWNER; SEC., TWN., AND RNG. OR LOT, BLK AND TRACT (BETWEEN ASTERISKS); DESCRIPTION: VALUE OF REAL ESTATE; VALUE OF IMPROVEMENTS; VALUE OF PERSONAL PROPERTY; AND AMOUNT OF TAXES DUE BEFORE INTEREST AND FEES."

This is a structured document. Had this information been stored in an open and well-established semantic format like the Extensible Markup Language (XML), it would be trivial to automatically extract the information from it (we call this "parsing"), using a standard tool. In fact, we could even transform the data (using XSLT) into a web page that displayed it in an easy to read and usable format.

Instead, some yokel did a Select All->Copy/Paste on a nicely formatted PDF file (which apparently they don't want to release to the public) and dumped some textual diarrhea on a page for some poor saps to sift through. One can imagine the common person's approach would be to go line by line and try to manually remove the data, which would be effort and time intensive.

It's more than just presentation. If this data is stored in a nicely structured format, we can query it to ask intelligent questions. What if I wanted to find all the plots with liens under $2000? All the plots with liens that have been paid before? All the plots owned by investors/companies versus private individuals? I can't do any of that with the data in this format.

All is not lost. As a computer scientist and informatician, I know that there are tools for information extraction from text that can help me. The most powerful one is called a Regular Expression, which I can use to perform targeted pattern matching on a block of text. If I figure out the right regular expression, I can parse the text and output it in a more readable format.

Figuring out such an expression is non-trivial. My annoyance is increased by realizing that this step will essentially duplicate work: I must take data that was nicely structured somewhere, then callously dumped in a big pile of blah, and reconstruct the structure. This duplication of effort wastes time and energy.

At least Yavapai County provides the PDF: http://www.publicnoticeads.com/notices/AZ/22/pdfs/YC%20TaxSaleAdvList2011final.pdf

...and that still sucks. Because while that may be easier to read for the eyes, PDF is much more complex to work with for automated information extraction than other formats. So querying still requires human brain q/a and is an inefficient process.

If they really cared about making this information "publicly accessible", they would store it in a structured and easily readable format on their websites for common users. They would also provide web service endpoints that could be used by external programs to gather the data and query it.

"Oh, that sounds expensive. Hard. Time consuming. Given AZ's budget problems-- where they're so strapped for cash they choose to let transplant patients die while they fund governmental initiatives to build bridges for endangered squirrels-- they simply don't have the budget to do that."

No. It's not. Furthermore, whatever small amount of resources would be necessary to accomplish such a task would be more than made up for by cost-savings in time and effort to prepare these documents, fax them to newspapers, and make countless duplicates of information that should live in one place. Investment that actually improves efficiency and costs less in the long run.

Besides, last I checked there's this small school called Arizona State University (and the University of Arizona, plus many other community colleges) that has Computer Science and Computer Information Systems students. If the costs were so prohibitive, perhaps they could approach the university as resources and try to have students give them FREE WORK.

Instead, they're busy cutting already low educational expenditure levels to even lower rates. Taking money away from the places that could help us all. Is that a sign that "all government is inefficient", or just really poor leadership?

While we ponder the answer to that question, I'm going to write a web app.

Saturday, January 1, 2011

Video Games offer the most effective form of story telling and learning

Thesis: Video games offer the most effective form of story-telling in the modern experience. This is because video games offer interactivity, sensory immersion across the 3 most information carrying senses (vision, sound, touch), and the duration of time spent playing the common video game is longer than the amount of time spent with other media (movies, small novels).

Examples of video games that can teach
- Dance dance revolution, XBox Kinect game
- Brain Age

...elaborate more later