Eric F. Savage

Life After JSON, Part 1

Posted on June 2, 2011June 1, 2011 by Eric F. Savage

XML. Simply saying that term can elicit a telling reaction from people. Some will roll their eyes. Some will wait for you to say something meaningful. Some will put their headphones back on and ignore the guy that sounds like a enterprise consultant. JSON is where it’s at. It’s new. It’s cool. Why even bother with stupid old XML. JSON is just plain better. Right?

Wrong. But that’s not to say that it’s worse, it’s just not flat-out better, and I’m a little concerned that it’s become the de facto data format because it’s really not very good at that role. I think its success is a historical accident, and the lessons to be learned here are not why JSON is so popular, but why XML failed.

Without going into too much detail, XML came out of the need to express data as more complex relationships than formats like CSV allowed. We had OOP steamrolling everything else in the business world, HTML was out in the wild showing that this “tag” idea had promise, and computers and I/O were finally able to handle this type of stuff at a cost level that didn’t require accounting for every byte of space.

In the beginning, it worked great. It broke through much of headaches of other formats, it was easy to write, even without fancy tools, and nobody was doing anything fancy enough in it that you had to spend a ton of time learning how to use it. After that though, things got a little, actually a lot … messy. User njl at Hacker News sums it up very nicely:

I remember teaching classes in ’99 where I needed to explain XML. I could do it in five minutes — it’s like HTML, but you get to make up tags that are case-sensitive. Every start tag needs to have an end tag, and attributes need to be quoted. Put this magic tag at the start, so parsers know it’s XML. Here’s the special case for a tag that closes itself.
Then what happened? Validation, the gigantic stack of useless WS-* protocols, XSLT, SOAP, horrific parser interfaces, and a whole slew of enterprise-y technologies that cause more problems than they solve. Like COBRA, Ada, and every other “enterprise” technology that was specified first and implemented later, all of the XML add-ons are nice in theory but completely suck to use.

So we have a case here of a technology being so flexible and powerful that everyone started using it and “improving” it. We started getting asked questions by recruiters like “can you program in XML” and “they used to do C++ but now they’re trying to use a lot more XML”. We had insanely complex documents that were never in step with the schemas they supposedly used. We had big piles of namespaces nobody understood in the interest of standards and reusable code.

Where the pain was really felt was in the increasing complexity of the libraries that were supposed to make XML easy to use, even transparent. There was essentially an inexhaustible list of features that a library “needed” to support, and thus none of them were ever actually complete, or stable. 15 years later there are still problems with XML parsing and library conflicts and weird characters that are legal here but not there.

So, long before anyone heard of JSON, XML had a bad rep. While the core idea of it was sound, it ultimately failed to achieve it’s primary goal of being a universal data interchange format that could scale from the simplest document to the most complex. People were eager to find a solution that delivered the value XML promised without the headaches it delivered. In fact, they still are, as nothing has really come close yet, including JSON, but perhaps we can build on the lessons of both XML and JSON to build the real next-big-thing.

XML ended up being used for everything from service message requests and responses to configuration files to local data stores. It was used for files that were a few hundred bytes to many gigabytes. It was used for data meant to represent a single model to a set of models to a stream of simple models that had no relation to each other. The fact that you could use it for all of these things was the sexy lure that drew everyone to it. Not only could you use it for all of these things but it was actually designed to be used for all of these things. Wow!

Ultimately, I think this was the reason for it’s downfall, and JSON’s unsuitability for doing everything has not surprisingly been the reason for it’s ascendance on its home turf (JavaScript calling remote services). Does it really make sense for my application configuration file to use the same format as my data caches and as my web services? Even now it’s hard for me to say that it’s a bad idea because the value of having one format to parse, one set of rules to keep in mind, one set of expectations I don’t need to cover in my documentation is nice. But with XML and JSON both proving this to be false in different ways, we have to take it as such.

The problem that’s on the horizon is that as JSON becomes the go-to format for new developers to use, the people that have been told all along that XML is bad and horrible and “harmful”, they’re using it for everything. We’re heading towards a situtation where we have replaced a language that is failing to do what it was designed to do with a language that is failing to do things it was never designed to do, which I think would actually be worse. Even if it’s not worse, it’s certainly still bad.

If we abide by the collective answer that XML is not the answer, or at least it’s the answer we don’t like, and my prediction that JSON isn’t either, what is? And why is nobody working on it? Or are they?

To be continued…

Books as Clutter

Posted on May 30, 2011July 22, 2011 by Eric F. Savage

Like the culture at large, I’m moving from physical media to digital. I’m slowly getting rid of almost all paper documents via my scanner. I haven’t bought a CD in years. I’ve never been a collector of movies. I haven’t had a roll of film developed since the early 90s. Our printer isn’t even usually hooked up, and when it is it’s usually to sign-and-scan a contract or something, as I haven’t found a great replacement for that yet.

Even more than the digital conversion, I don’t even bother with much physical media. Files are backed up to online services like S3 or Rackspace via Jungledisk. I have some 1TB external drives for peace of mind and Verizon’s inevitable billing errors, but never burn anything to DVD or CD.

I kept all of my old CDs, because I wasn’t comfortable with throwing out full-quality versions of something. There is FLAC, but at the time I switched a few years ago I wasn’t happy with the FLAC-encoding tools so I went with 256k VBR MP3 files, and figured I’d re-encode them again someday and then be able to toss the discs. This argument does’t make a ton of sense given that I now pay for degraded copies of new music, but that’s a little psychology I’ll put off analyzing.

Books, however, are tough for me. I’ve been using a Kindle for a while, and love it, as do most who have one. I look at my bookshelf and think “this doesn’t really need to be here”. I’ve tossed a number of books, but I pick up an old Choose Your Own Adventure book and the innumerable hours I spent reading and re-reading them comes back to me. The thought of throwing it away is unsettling.

I will probably never even read these books again. I’m not sure if my potential future kids would bother with them, but the sentimentality runs too deep. So I keep them, and even my minimalist girlfriend probably understands. It’s not like I have thousands of them, there’s probably 100 books I can’t toss.

Some books I keep because technology just hasn’t caught up to them yet. Cookbooks, picture books, and so on. These will probably go eventually, I end up tossing a few each time I look through the shelves. The sentimental books that are signed by the author or were gifts I keep too, I don’t see a real clutter issue there either, and again, I don’t have too many of these.

The quandry comes with new books. These books have no sentimental value yet, nor will they ever, and I think that’s part of the trouble. The part of me that wants to move into the future, to be more mobile, more organized, more free of physical possessions has not yet found a decisive victory over the nerd who did a book drive for his Eagle project and spent those precious half-days of school lost in the annex at the Ames Free Library.

I’ve always looked at books as some strange kind of investment. I spend $10 on a book, and read it. I can read it again years later, or give it to someone else, there’s some residual value, (not monetary, selling books is hardly worth the trouble IMO). But now I click “buy it now” and I still realize what is arguably the real value of the book, yet I have nothing to put on my shelf and page through from time to time. Nothing to jog my memory when I see it, or spark a conversation when someone else sees it. Nothing I can hand to a friend and say “you really need to read this”.

I’m not even that concerned with Amazon going away or revoking my access to these books, while that would be unfortunate, I can always buy them again somewhere else. Physical books can be destroyed or stolen too, probably even more easily than e-books. Nor am I too concerned with the privacy issues, although I do recognize that lack of concern is a privilege not everyone has. The idea of an oppressive government “burning” or suppressing a book is real as well, but I think computers are so numerous now that this knowledge will find a way to survive. One small hard drive can hold literally millions of books, I’m sure at least one copy will survive.

Unfortunately this isn’t a very constructive post, as I don’t think there is an actual solution to this. I guess this is more of a eulogy. It’s a problem faced by most generations that see the things they grew up valuing being devalued, and it stings for someone who, if you asked him anywhere from age 5 to 15, probably would have said the most valuable/important thing he owned was his book collection.

To switch or not to switch, part 3

Posted on May 26, 2011June 7, 2011 by Eric F. Savage

This entry is part 3 of 3 in the series To switch or not to switch

Continued from Part 2, we’re down to 41 languages that you could potentially build a modern, database-driven application with. I’d like to knock another 10-12 more off the list before I get into trying the language themselves, but I’m running out of black-and-white rules to do it with.

Concurrency is a big part of scaling, but that’s a tough characteristic to pin down. Any JVM-based language is going to have access to a great threading model, but people have been able to scale ~~languages that lack threads at all, like Python~~ (Edit: Python has threads) languages that have poor or difficult threading, to decent volumes as well, so clearly threads are not a requirement.

I spend much of my time writing web software, so robust support of that, like Java’s Servlet specification and third party additions like Spring MVC, is nice to have as well. Ruby and Python have made many of their gains based on the strengths of their web frameworks. However, an otherwise good language could add a web framework relatively easily, so again it’s not a requirement.

The way a language handles text is important when working with users, databases, web services, etc., but this can be addressed with libraries. Object orientation can be nice too, as can other models. Interpreted vs. compiled, machine code vs. byte code, exceptions, static typing, dynamic typing, the list goes on with important details that aren’t actually important enough that I can’t really live without them. I certainly prefer static typing, checked exceptions, painless threading, garbage collection, run-anywhere bytecode, running in a virtual machine, but I know smart people who can make valid arguments against every one of those, and maybe I just haven’t given the alternatives enough of a chance.

What is really important to me about a language is its community, leadership, and, for lack of a better term, motive. For this reason I’m going to knock the Microsoft-led CLI languages off the list. I know that CLI is a standard, and that the open source implementation is not beholden to it, but I’m simply not going to program in the Microsoft ecosystem. I think they just don’t care about open source and independent development. This eliminates:

C#
Boo
Cobra
F#

It’s kind of a shame because F# looks interesting and is something I’d like to tinker with some day. I also think C# is actually a pretty good language, and, while it initially seemed like Java Jr., it evolved to have some powerful features, moreso than Java in some ways. Unfortunately, even with Mono in the picture, it’s still a Microsoft product to me.

It should definitely be noted that I’m not exactly happy about Oracle owning Java now either. Due to Java’s inertia, I haven’t seen any Oracle decisions affect me yet, and I’ve probably got a number of years before I have to deal with that if and when they decide to do something bad. They could kill it entirely and people would still be using it for a long time.

Along similar lines, Objective-C’s fate seems to be tied closely with the whims of Apple. I don’t see any interest in using it for non-Apple OS projects, so it is bumped off the list as well.

Objective-C

Also some of these languages have such a small community or slow pace of development that they don’t seem worthy of investing in:

ALGOL 68
Clean
Dylan
GameMonkey Script
Mirah
Unicon

Three of these languages are active, but the community (and therefore the language) are solving very different problems, Processing with its graphics and visualizations, Scratch with its introductory/training aspects, and Vala is tied closely to GNOME applications.

Processing
Scratch
Vala

This leaves 27 languages moving on to part 4, most of which I’m definitely looking forward to learning more about:

Ada
Clojure
Common Lisp
D
Eiffel
Erlang
Falcon
Fantom
Factor
Go
Groovy
Haskell
Io
Java
JavaScript
Lua
MUMPS
Pike
Pure
Objective Caml
Python
Ruby
Scala
Scheme
Squeak
Tea
Tcl

Too much RAM?

Posted on May 23, 2011May 23, 2011 by Eric F. Savage

I’ve been spec’ing out a new desktop and am intrigued by something I’m not sure I’ve seen before. It’s actually possible to get a computer with too much RAM now. 32GB is not really high-end, or expensive, and yet I can’t really think of any consumer/business application that would use that. Programs like photoshop or video editing or scientific engineering software have pretty much infinite appetites, so those programs would really be a question of diminishing returns.

As for me, I could literally put an an entire Windows 7 virtual machine on a RAM disk* with a ton of code, some decent-sized databases, and give that VM 4-8GB of RAM, and still have physical memory to spare. That said, I’m certainly going to get (at least) 32 because it’s a cheap upgrade from 8 or 16, so if I find a use for it, I’ll report back.

Making claims about the future can haunt you forever, even if you didn’t actually say it, so I’m not saying that we’ll never need 32GB, of there will ever actually be such a thing as “too much RAM” but I’m curious how long it will be until we do.

*You might think that the RAM disk is dead with the availability of SSD, but RAM is still dramatically (50-100x) faster than even high-end SSDs like Fusion-io, thanks to the limitations of the SATA or PCI bus.

To switch or not to switch, part 2

Posted on May 19, 2011June 7, 2011 by Eric F. Savage

This entry is part 2 of 3 in the series To switch or not to switch

Continued from Part 1, I’m looking at candidates to replace my current main language: Java. I’m not actually eager to get rid of Java, I still really like it, and enjoy working in it, but I need to convince myself that it’s the right choice for me to continue to invest unpaid time in, or I need to find something that is. My career is, optimistically, 1/3 over, I’ve got 20-30 years left tops, and while I’d actually bet that Java programmers will be needed in 30 years, I’m not sure how much interesting stuff will be happening.

So, let’s add a couple more rules to winnow the set.

Rule 3: It have some kind of network database support.
Almost everything I do involves a database at some point. The volumes of data and network architectures we deal with today rule out simple file I/O, or even local-only databases. I did not look especially hard for the answer to this question, in my opinion if it wasn’t easy to find, it’s not sufficient. Technically, I could write/port my own driver, but if nobody else has done it, I have to suspect that the users are solving very different problems than I am. This eliminates:

Agena
ATS
BASIC
BETA
Diesel
E
FORTH
Icon
Ioke
Logo
Maple
MiniD
Miranda
Modula-3
Nu
Reia
Sather
SQL
Self
SPARK
Squirrel
Timber

Rule 4: This is a tricky one, but I’m going to say it must not have “fallen from grace”. This is essentially the state that Java is entering, depending on who you ask. It’s perfectly functional, and widely used, but it’s had its day and isn’t hip anymore. This doesn’t exclude languages that are just old, but were never at the top, like Eiffel, but I don’t see any reason to abandon Java and go with COBOL.

C
C++
COBOL
Fortran
Pascal
Perl
PHP
Visual Basic .NET

Now, some of those those languages, like C, are still very popular, and important. You could even say that they are continuing to get better and stay modern. C will probably outlive most of these languages, as none of them are strong candidates to rewrite Linux in yet. My argument is that nobody is really using C to solve any new problems in new ways. This leaves 41 languages that are active, capable of doing at least basic database operations, and have not entered decline.

Ada
ALGOL 68
Boo
C#
Clean
Clojure
Cobra
Common Lisp
D
Dylan
Eiffel
Erlang
F#
Factor
Falcon
Fantom
GameMonkey Script
Go
Groovy
Haskell
Io
Java
JavaScript
Lua
Mirah
MUMPS
Objective Caml
Objective-C
Pike
Processing
Pure
Python
Ruby
Scala
Scheme
Scratch
Squeak
Tcl
Tea
Unicon
Vala

affinity.txt

Posted on May 17, 2011May 17, 2011 by Eric F. Savage

SEO sucks. It’s a fun little game to play for a while, but at the end of the day almost everyone loses. The searchers lose because they can’t find the best stuff any more. The search engines lose because their searchers see worse results and are less happy. Legitimate creators and business lose because they have traffic siphoned off by spammers and scrapers. They’re forced to waste brainpower and money on this ridiculous game that, from where I’m standing, they’re losing.

I’m not pretending that there was some golden age without spammers, they’ve always been there, and they always will be. Originally we had straight-up content matches, then keywords. Search quality was approaching unusability before Google came on the scene. They did a good job, PageRank put a trust network in the mix, and worked great for a while. Eventually that well was poisoned too. They’ve done many things since, and I bet many of them helped. The new site blacklist is a good step but certainly a tricky one to use. The rise of Bing is actually a good thing, I think Google is in a much better position to take risks and change things when they’re at 70 or 80% of the market as opposed to 95%.

So let’s stop talking about SEO as a black art. Let’s stop making our sites worse to prolong a losing battle. Let’s take that energy and put it into something else. Put your content out there in the best format you can. Forget code-to-content ratios and maximizing internal link structures that don’t benefit your users.

Instead, let’s think of ways that we can help explicitly affect results. Here’s one: Think robots.txt meets social graph meets PageRank. site.com lists other sites in it’s /affinity.txt file, and defines some coarse relationship. Something like this:

widgetfactory.com/affinity.txt

www2.widgetfactory.com self
*.widgetwiki.org follow
*.widgetassociation.net follow

So we’ve got the “self” tag that basically says “this is another version, or a related version, of me.” If www2.widgetfactory.com/affinity.txt has a “widgetfactory.com self” entry, you’ve got yourself a verified relationship. We’ve also got something like a follow tag that says “we like these sites and think they’re valuable, you should go there (and follow links from us).” It’s basically a vote. Unidirectional votes are useful for quality, and mutual votes are a big clue about transferring trust.

How many votes do you get? No idea, I don’t see any reason to limit it. I think those things will work themselves out. I don’t see regular sites managing thousands of links in there, I think they just link to enough to make it meaningful. Or maybe there is some hard limit, so there’s less of a guessing game on how to pick who goes there. 100 per site? 1000?

Now you might think, “this is just pagerank” but it actually is different I think. First, it’s much easier to spot foul play. There are far fewer domains than pages, so the graph is much smaller and easier to traverse. Junk sites are going to stick out like a sore thumb. A spammer can’t really use this channel by making artificial networks of trust because it would be so easy to kill them en masse. It’s also difficult to mask or overwhelm like linkfarms.

Does this hurt the little guy? Not any more than spammers, I think. I think even though the expression of the data is simple, the interpretation of it can be very complex. If you’ve got a little blog that just blathers on about computer stuff and doesn’t even have any ads on it, you may not need much trust to rise up on specific content searches. If you’ve got a site with 3 million pages that look an awful lot like wikipedia pages, and your only votes are from other similar sites, and your whole cell has no links from the larger graph, well, maybe, despite your flawless, compact markup and impeccable word variances, you’re not really adding much value, are you?

I’m not saying this particular idea solves all of our problems,, heck, I’m sure there are some problems with it as described, but I think approaches like this will not only affect the quality of our search results, but they will ultimately affect the quality of the web overall.

When will Google buy VMWare?

Posted on May 16, 2011July 22, 2011 by Eric F. Savage

Google’s Chromebooks are starting to go mass-market. For those that don’t know, these are essentially laptops that only have a web browser on them. No Windows, no OS X or Linux. To many people, this seems ludicrous. You need apps, right? You need data?

The truth is, the majority of people already only use browser “apps”, which we used to call “websites”. Google has been leading the charge on this, by pushing the envelope on in-browser apps with Google Docs and the Chrome App Store. There are other players too, Apple is training people to buy apps, and not worry about having to reinstall them when your hardware fails. DropBox is training people to sync everything. Amazon is training people to have virtual CD shelves. Steam is training people to have virtual game libraries. Citrix and LogMeIn are training people to work on remote desktops, and so on.

These are all coming together to get people to the point where we basically go back to dumb terminals. Your computer is nothing more than a local node on the network. That, however, is not the interesting part, people have been saying that for years.

The interesting part, to me, is that it’s not actually going to be the typical early adopters going there first. My girlfriend’s computer literally has nothing on it. She only uses browser apps and iTunes, which is connected to our NAS where her photos are also stored. She uses GMail, Facebook, Google Docs, etc. With the exception of syncing her iPod, which I have to assume someone will figure out how to do in the ChromeOS ecosystem, I’m not sure she would really notice any difference.

Now my computer(s) are a long ways from there. I’ve got development environments, SQL servers, mail servers, all sorts of infrastructure set up. I could certainly move to a remote desktop or a remote terminal on a server, but the change would be much more disruptive and not without some costs.

Along a different path, we’ve seen a long progression of advances in virtualization. I actually do most of my work in VMs now, for a number of reasons, but one of which is that I’m not dependent on a particular piece of hardware. If my laptop is destroyed or stolen, I’m back up to speed very quickly. The only thing I’d need to do is install VMWare, plug in my drive, and I’m good to go.

I think these two paths are going to meet up soon. I think ChromeOS is a way to get the low-demand computer users on board. If Google buys VMWare, they can come at it from the other end as well. I think VMs will get leaner while browsers get more robust, and we’ll end up with a hybrid of the two. A lightweight OS that is heavily network/app/web based? I wonder where Google would get one of those? Oh, right, they already did.

To switch or not to switch, Part 1

Posted on May 15, 2011May 31, 2011 by Eric F. Savage

This entry is part 1 of 3 in the series To switch or not to switch

I was listening to a talk the other day and the speaker derisively mentioned “those people who are happy writing Java for the rest of their lives”, and I thought “Am I one of those?” and then I thought “is that a bad thing?”. As part of my “question everything” journey, I decided that it was time, after 10+ years, to have Java report for inspection and force it to defend its title.

I should make it clear, that I am not a language geek, or collector. I generally disagree with “use the right language for the right problem”, I prefer “use the right language for most of your problems”. So far, Java has been that for me. Some things I do in Java are more easily done in other languages, but not so much so that it overtakes the headaches of heterogeneous codebases. If something is really difficult, or impossible in your main language, bring something else in, but keep it simple. I also think it’s fine to have more than one main language, a client of mine is currently transitioning off C#, keeping Java, and adding Python. What they don’t have is random parts of their infrastructure done in erlang or perl or tcl because that’s what someone wanted to use that day.

I could make this task easier and just look at the “marketable” skills out there, which is a small subset. While I think it’s unlikely that there is some forgetten language just waiting for its moment, it’s certainly possible I could find a neat one that’s fun to play with. Languages like Ruby and Python spent years before people could find jobs doing them. So I’m going to look at literally every single language I can find, and put them through a series of tests. If you find a language I haven’t mentioned, let me know and it will be given the same chance as the rest.

Round 1:

The point of this round is to identify languages that have any potential for being useful to me.

Qualifying Criteria

Rule 1. It must be “active”.
This is admitedly a subjective term, but we’ll see how it goes. Simula is clearly not active, while Processing clearly is, with a release only weeks ago.
Rule 2. It must compile and run on modern consumer hardware and operating systems.
This means, at minimum, it works on at least one modern flavor of Linux, because I will want this to run on a server somewhere, and I don’t want a Windows or OS X server, or worse, something obscure. For bonus points, it will also work on Windows 7 and/or OS X.

So, that’s it for now. There are no requirements for web frameworks or lambdas or preference for static versus dynamic typing, I think those elements will play out in later rounds.

Ada
Agena
ALGOL 68
ATS
BASIC
BETA
Boo
C
C#
C++
Clean
Clojure
COBOL
Cobra
Common Lisp
D
Diesel
Dylan
E
Eiffel
Erlang
F#
Factor
Falcon
Fantom
FORTH
Fortran
GameMonkey Script
Go
Groovy
Haskell
Icon
Io
Ioke
Java
JavaScript
Logo
Lua
Maple
MiniD
Mirah
Miranda
Modula-3
MUMPS
Nu
Objective Caml
Objective-C
Pascal
Perl
PHP
Pike
Processing
Pure
Python
Reia
Ruby
Sather
Scala
Scheme
Scratch
Self
SPARK
SQL
Squeak
Squirrel
Tcl
Tea
Timber
Unicon
Vala
Visual Basic .NET

This list is actually a LOT longer than I expected, and yes, there actually is a modern version of ALGOL 68. Stay tuned for part 2.

The Ultimate Music App

Posted on May 14, 2011May 11, 2011 by Eric F. Savage

There’s an ever-growing number of online music services out there, but none of them have really nailed it for me. Here’s my list of demands:

Instant Purchase – Simple one or two click purchase, which adds it to my portfolio. Downloading from one place and uploading to another is dumb.
Standard format/no DRM – This is why subscription-based services won’t work.
Automatic Download/Sync – As seamless as DropBox, maybe even with a few rules (per playlist, etc).
Smart Playlists – The only reason I use iTunes is that I can set up playlists with dynamic criteria, like “stuff I like that I haven’t heard in 2 weeks”. This entails tracking what I listen to and being able to rate stuff.
Upload My Own – No reason for me to have to buy things again. I’m fine with paying a small extra fee for this, but I should also be able to work that off by buying new stuff. Amazon hosts stuff I’ve bought from them for free, but charges me for uploads, so in the long run they could actually end up costing me more. They should give me a 50MB bonus per album to upload other files.
Mobile – My phone is my music player now, I should be able to stream/sync/download from it as well as my computer.

Bonus Features

API – Let me have another program talk to your service to do things like recommendations and missing tracks.
Podcasts – This doesn’t necessarily have to be done in-service, if the API allowed uploads someone else could do it, but it seems pretty trivial to add on if all of the above things are in place.

Don’t Really Care

Sharing – Nice to have but I’d be fine with a service I can’t share. I’d prefer the option to sign into more than one account at a time.

Software that isn’t afraid to ask questions

Posted on May 13, 2011June 2, 2011 by Eric F. Savage

An area that user-focused software has gotten better at in the past 10 years or so is being aware, and protective of, the context in which users are operating. Things like autocomplete and instant validation are expected behaviors now. An area that software is really picking up steam is analytics, understanding behaviors. You see lightweight versions of this creeping into consumer software with things like Mint.com and the graphs in Thunderbird, but most of the cool stuff is happening on a large scale in Hadoop clusters and hedge funds, because that where the money is right now.

But where software has not been making advancements is in being proactively helpful, using that context awareness, as well as those analytics. If that phrase puts you in a Clippy-induced rage, my apologies, but I think this is an area where software needs to go. I think Clippy failed because it was interfering with creative input. We’ve since learned that when I user wants to tell you something, you want to expedite that, not interfere. Google’s famed homepage doesn’t tell you how, or how to search. They’ve adapted to work with what people want to tell it.

I’m talking about software that gets involved in things computers are good at, like managing information, and gets involved in the process the way that a helpful person would. We’ve done some of this in simple, mechanical ways. Mint.com will tell me when I’ve blown my beef jerky budget, Thunderbird will remind you to attach a file if you have the word “attached” in your email. I think this is a teeny-tiny preview of where things will go.

Let’s say you get a strange new job helping people manage their schedule. You get assigned a client. What’s the first thing you do, after introducing yourself? You don’t sit there and watch them, or ask them to fill out a calendar and promise to remind them when things are due. No, you ask questions. And not questions a computer would currently ask, but a question like “what’s the most important thing you do every day?”. Once you’ve gotten a few answers, you start making specific suggestions like “Do you think you could do this task on the weekends instead of before work?”.

Now, we’re a long way from software fooling people into thinking it cares about them, or understand their quirks, but we’re also not even trying to do the simple stuff. When I enter an appointment on Google calendar, it has some fields I can put data in, but it makes no attempt to understand what I’m doing. It doesn’t try to notice that it’s a doctor’s appointment in Boston at 9am and that I’m coming from an hour away during rush hour, and maybe that 15 minute reminder isn’t really going to do much. It would be more helpful if it asks a question like “are you having blood drawn?”, because if I am, it can then remind me the night before that I shouldn’t eat dinner. It can look at traffic that morning and tell me that maybe I should leave even earlier because there’s an accident. It can put something on my todo list for two weeks from now to see if the results are in. All from asking one easy question.

Now, a programmer who got a spec with a feature like this would probably be speechless. The complexity and heuristics involved are enormous. It would probably get pared down to “put doctor icon on appointment if the word doctor appears in title”. Lame, but that’s a start, right? I think this behavior is going to be attacked on many fronts, from “dumb” rules like that, to fancy techniques that haven’t even been invented yet.

I’ve started experimenting with this technique to manage the list of ideas/tasks I have. In order to see how it might work, I’ve actually forbidden myself to even use a GUI. It’s all command line prompts, because I basically want it to ask me questions rather than accept my commands. There’s not much to it right now, it basically picks an item off the list, and says, “Do you want to do this?” and I have to answer it (or skip it, which is valid data too). I can say it’s already done, or that I can’t do it because something else needs to happen first, or that I just don’t want to do it today.

If it’s having trouble deciding what option to show me, it will show two of them and say “Which of these is more important?”. Again, I’m not re-ordering a list or assigning priorities, I’m answering simple questions. More importantly, I’m only answering questions that have a direct impact on how the program helps me. None of this is artificial intelligence or fancy math or data structures, the code is actually pretty tedious so far, but even after a few hours, it actually feels helpful, almost personable.

If you know of any examples of software that actually tries to help in meaningful ways, even if it fails at it, let me know!