affinity.txt

SEO sucks. It’s a fun little game to play for a while, but at the end of the day almost everyone loses. The searchers lose because they can’t find the best stuff any more. The search engines lose because their searchers see worse results and are less happy. Legitimate creators and business lose because they have traffic siphoned off by spammers and scrapers. They’re forced to waste brainpower and money on this ridiculous game that, from where I’m standing, they’re losing.

I’m not pretending that there was some golden age without spammers, they’ve always been there, and they always will be. Originally we had straight-up content matches, then keywords. Search quality was approaching unusability before Google came on the scene. They did a good job, PageRank put a trust network in the mix, and worked great for a while. Eventually that well was poisoned too. They’ve done many things since, and I bet many of them helped. The new site blacklist is a good step but certainly a tricky one to use. The rise of Bing is actually a good thing, I think Google is in a much better position to take risks and change things when they’re at 70 or 80% of the market as opposed to 95%.

So let’s stop talking about SEO as a black art. Let’s stop making our sites worse to prolong a losing battle. Let’s take that energy and put it into something else. Put your content out there in the best format you can. Forget code-to-content ratios and maximizing internal link structures that don’t benefit your users.

Instead, let’s think of ways that we can help explicitly affect results. Here’s one: Think robots.txt meets social graph meets PageRank. site.com lists other sites in it’s /affinity.txt file, and defines some coarse relationship. Something like this:

widgetfactory.com/affinity.txt

www2.widgetfactory.com self
*.widgetwiki.org follow
*.widgetassociation.net follow

So we’ve got the “self” tag that basically says “this is another version, or a related version, of me.” If www2.widgetfactory.com/affinity.txt has a “widgetfactory.com self” entry, you’ve got yourself a verified relationship. We’ve also got something like a follow tag that says “we like these sites and think they’re valuable, you should go there (and follow links from us).” It’s basically a vote. Unidirectional votes are useful for quality, and mutual votes are a big clue about transferring trust.

How many votes do you get? No idea, I don’t see any reason to limit it. I think those things will work themselves out. I don’t see regular sites managing thousands of links in there, I think they just link to enough to make it meaningful. Or maybe there is some hard limit, so there’s less of a guessing game on how to pick who goes there. 100 per site? 1000?

Now you might think, “this is just pagerank” but it actually is different I think. First, it’s much easier to spot foul play. There are far fewer domains than pages, so the graph is much smaller and easier to traverse. Junk sites are going to stick out like a sore thumb. A spammer can’t really use this channel by making artificial networks of trust because it would be so easy to kill them en masse. It’s also difficult to mask or overwhelm like linkfarms.

Does this hurt the little guy? Not any more than spammers, I think. I think even though the expression of the data is simple, the interpretation of it can be very complex. If you’ve got a little blog that just blathers on about computer stuff and doesn’t even have any ads on it, you may not need much trust to rise up on specific content searches. If you’ve got a site with 3 million pages that look an awful lot like wikipedia pages, and your only votes are from other similar sites, and your whole cell has no links from the larger graph, well, maybe, despite your flawless, compact markup and impeccable word variances, you’re not really adding much value, are you?

I’m not saying this particular idea solves all of our problems,, heck, I’m sure there are some problems with it as described, but I think approaches like this will not only affect the quality of our search results, but they will ultimately affect the quality of the web overall.

When will Google buy VMWare?

Google’s Chromebooks are starting to go mass-market. For those that don’t know, these are essentially laptops that only have a web browser on them. No Windows, no OS X or Linux. To many people, this seems ludicrous. You need apps, right? You need data?

The truth is, the majority of people already only use browser “apps”, which we used to call “websites”. Google has been leading the charge on this, by pushing the envelope on in-browser apps with Google Docs and the Chrome App Store. There are other players too, Apple is training people to buy apps, and not worry about having to reinstall them when your hardware fails. DropBox is training people to sync everything. Amazon is training people to have virtual CD shelves. Steam is training people to have virtual game libraries. Citrix and LogMeIn are training people to work on remote desktops, and so on.

These are all coming together to get people to the point where we basically go back to dumb terminals. Your computer is nothing more than a local node on the network. That, however, is not the interesting part, people have been saying that for years.

The interesting part, to me, is that it’s not actually going to be the typical early adopters going there first. My girlfriend’s computer literally has nothing on it. She only uses browser apps and iTunes, which is connected to our NAS where her photos are also stored. She uses GMail, Facebook, Google Docs, etc. With the exception of syncing her iPod, which I have to assume someone will figure out how to do in the ChromeOS ecosystem, I’m not sure she would really notice any difference.

Now my computer(s) are a long ways from there. I’ve got development environments, SQL servers, mail servers, all sorts of infrastructure set up. I could certainly move to a remote desktop or a remote terminal on a server, but the change would be much more disruptive and not without some costs.

Along a different path, we’ve seen a long progression of advances in virtualization. I actually do most of my work in VMs now, for a number of reasons, but one of which is that I’m not dependent on a particular piece of hardware. If my laptop is destroyed or stolen, I’m back up to speed very quickly. The only thing I’d need to do is install VMWare, plug in my drive, and I’m good to go.

I think these two paths are going to meet up soon. I think ChromeOS is a way to get the low-demand computer users on board. If Google buys VMWare, they can come at it from the other end as well. I think VMs will get leaner while browsers get more robust, and we’ll end up with a hybrid of the two. A lightweight OS that is heavily network/app/web based? I wonder where Google would get one of those? Oh, right, they already did.

Why don’t websites have credits?

Engineers of any discipline are largely an anonymous bunch. You don’t know who designed the fuel pump in your car, I’d even wager it would be extremely difficult for you find out if you wanted to. You don’t know who wrote the code for the OS X Dock or Windows Start bar or who wrote the Like button on Facebook. These people made decisions that affect you deeply every day, and you have no idea who they are.

The most interesting part of this is that those people are OK with it. If you ask them (myself included) they will tell you that it doesn’t matter, that what really matters is the quality of the work and the enjoyment you had doing it. Unfortunately, I think we’re wrong.

Should they?

I can’t seem to come up with a good framework for who figuring out who wants credit, never mind who deserves it. If you so much as make a photocopy during the production of a movie, you’re probably in the credits with some high-faluten title like “First deputy assistant duplication specialist”. Music credits are tied to royalties and managed very closely. Most authors wouldn’t think about publishing something anonymously, nor would artists or sculptors. Artists always sign their work.

This is not even strictly a software issue. Video games list credits, often in the box and at the end of the game, and they even have a IMDB-like site. Nor is it an “arts & entertainment” issue, any credible scientific paper will cite other works and acknowledge contributions. Patents have names on them, even when assigned to a company.

A few software packages have listed credits. If I remember correctly, Microsoft did it on old versions of Word and Excel, and Adobe had it on old versions of Photoshop and Illustrator. I’m curious why those were removed, or at least hidden. “The Social Network” had something about Saverin being removed and re-added to “masthead” of Facebook (although I don’t know what or where that is).

So it would seem that we might be in the minority here, perhaps due to convention rather than any specific reason. And if there’s one thing that bugs an engineer, it’s deviating from standards with no good reason.

So let’s do it.

Why do it?

  • Pride in your work – Sure there is some pride in doing a good job anonymously, but wouldn’t be just a little more motivated or happy now that your name is on it?
  • Being a stakeholder – We’ve all done projects we didn’t believe in, and consoled ourselves with the fact that “it’s not my project”. Well, now it is.
  • Reputation – We’ve got our resumes, but credits will verify them.
  • Honesty/Transparency – There is no good reason to withhold this information, so it should be out there.
  • All that money they spent on school – Show your parents your name on a website and watch them smile.

So who’s get listed?

I think the short answer here is, everyone. Movies do it, why not websites? It could be just a big list of names, or something more detailed with contributions, dates, whatever makes sense. Let’s just start throwing some names up there, and let the de facto standards evolve on their own.

If you know of any major sites that do this well, put it in the comments. Similarly, if you can think of a good reason why this shouldn’t happen, I’d love hear about it.

Fire the user experience designer

This post makes a case for having a specialized “user experience designer”. The author makes the case that usability and interaction design is too complicated to be handled by someone responsible for other tasks. This is false.

If you are on a team responsible for a website or something similar, EVERYONE on your team should understand usability and interaction design. It’s not a special skill, it’s core competency, like communication skills and ethics. The real experts out are rare, and I mean “you’ll probably never even meet one” rare. Most people who specialize in it are just washed-up designers or coders.

You need your designers thinking about how people will interact with your program, or you’re going to end up with brochureware. You need you programmers thinking about it or you’re going to end up with a clumsy UI. You need your QA people to think about it or you’re going to end up with spotty test plans. You need your managers thinking about it to understand what’s important. You need your salespeople thinking about it to compare against your hapless competition.

Having someone responsible for it is a bad idea because not only are they probably going to suck at it, it’s just going to make everyone else lazy.

Register My Login to Join Your Account

One of the details that can be tough to keep track of with a large or fast-moving website is language consistency. Of course, to be consistent, you need to decide what to use. I did an audit of the most popular English-language sites (as determined by Alexa and Compete), to see how three key phrases were being used. These were:

Login/Log In/Sign in – The action of authorizing your account.
My/Your – My Movies, Your Account, etc.
Join/Sign Up/Register/Create – Creating a new account.

Here is the raw data, see below for some analysis.

adultfriendfinder.com login my join
aim.com sign in my join/get
amazon.com sign in your start
aol.com sign in my sign up
bankofamerica.com sign in your* enroll
blogger.com sign in my create
craigslist.com login N/A sign up
deviantart.com login N/A become/join
ebay.com sign in my register
facebook.com login my sign up
flickr.com sign in your create
fotolog.com log in/login my join
friendster.com log in my sign up
go.com (espn) sign in my register
google.com sign in my create
hi5.com log in my join
imageshack.us login my signup
imdb.com login my register
live.com sign in my sign up
mininova.com login my register
msn.com sign in my sign up
myspace.com login my sign up
neopets.com login my sign up
photobucket.com log in my join
pogo.com sign in my register
rapidshare.com login my join
store.apple.com login* N/A create/set up
veoh.com log in my register
walmart.com sign in my create
wikipedia.org log in create
wordpress.com login my sign up
yahoo.com sign in my sign up
youporn.com login my register*
youtube.com log in my sign up

* Inconsistent

“My” is the clear winner over “Your”, with 27 mys, 3 yours, and 2 that avoid using possessive pronouns.

“Login” takes the edge over “Sign In”, 20-14. “Sign In”, however, seems to be more popular with the biggest of the big sites, like Yahoo, Microsoft’s sites, and Google. I’d say this is a tossup, and I have a feeling that in a few years signup with come to dominate. Of those using login, 13 use “login”, and 7 use “log in”, with the space.

There’s a plurality of choices for sign up, with “sign up” being used on 12 sites. 7 used join, 7 used register, 6 used create (an account), 1 used start, and 1 used enroll. This is not an independent choice, however, as “sign up” is often seen where “log in” is used, and sites that use “sign in” use something like “register”. AOL, Microsoft, and Yahoo use “sign in/sign up”. I suspect that some people think using such similar phrases would be confusing, and I agree, despite the appeal of the general consistency.

My preference is to use “my, “log in”, and “sign up”. “Join” seems ambiguous, “register” seems bureaucratic and expensive, while “create an account” just feels a little dorky.

Dishonorable Mention: The Apple Store, supposed paragon of usability and attention to detail, is the worst offender on this list in terms of mixing and matching the terms, often on the same page. They also fail miserably on one major point, there’s no logout button!

Alexa

Tikkataulu, a finnish (nordic) game that is similiar to darts.I’ve spent most of my career working with and for large companies and clients. I watched the 90’s .com boom mostly from the outside, and only recently got involved with the startup ecosystem. So far, I’m finding it exciting and appealing, but there’s one aspect that flat-out confuses me.

Alexa is, in brief, spyware. It’s not sketchy like Gator or any of the other horrible things nerds have been cleaning off their mom’s computers for years, but it basically tells a server which websites you visit. In return for this, you can view some meta-information about the site you’re on, find related sites, etc. It’s been around for a while, and is now part of Amazon’s otherwise benevolent kingdom.

I have never, ever seen anyone use it. Anyone who is tech-savvy with a modicum of concern for privacy wouldn’t even think about installing it. I’ve even experimentally explained it to some non-techy people in neutral terms, and gotten negative responses.

The thing that confuses me is that it seems to be practically gospel in the startup/VC community, especially those who sit in the bleachers and offer commentary about the sector. This is largely due to the fact that it’s the only way to compare traffic between two sites, but its results are more than inaccurate, they are misleading, and dangerous.

I have available to me (not shareable, sorry) the traffic information for two sites. Site A has a mainstream audience and no revenue. Site B has a somewhat more technically-oriented audience, and is barely profitable. Site B gets twice as many visitors (and far more traffic) as Site A. Site A’s metrics on Alexa are twice that of Site B. I don’t even think a margin of error this vast has a name in statistics.

So I guess what I’m trying to say here is don’t use Alexa “until something better comes along” or “just for another viewpoint”. Just don’t use it at all.