affinity.txt

SEO sucks. It’s a fun little game to play for a while, but at the end of the day almost everyone loses. The searchers lose because they can’t find the best stuff any more. The search engines lose because their searchers see worse results and are less happy. Legitimate creators and business lose because they have traffic siphoned off by spammers and scrapers. They’re forced to waste brainpower and money on this ridiculous game that, from where I’m standing, they’re losing.

I’m not pretending that there was some golden age without spammers, they’ve always been there, and they always will be. Originally we had straight-up content matches, then keywords. Search quality was approaching unusability before Google came on the scene. They did a good job, PageRank put a trust network in the mix, and worked great for a while. Eventually that well was poisoned too. They’ve done many things since, and I bet many of them helped. The new site blacklist is a good step but certainly a tricky one to use. The rise of Bing is actually a good thing, I think Google is in a much better position to take risks and change things when they’re at 70 or 80% of the market as opposed to 95%.

So let’s stop talking about SEO as a black art. Let’s stop making our sites worse to prolong a losing battle. Let’s take that energy and put it into something else. Put your content out there in the best format you can. Forget code-to-content ratios and maximizing internal link structures that don’t benefit your users.

Instead, let’s think of ways that we can help explicitly affect results. Here’s one: Think robots.txt meets social graph meets PageRank. site.com lists other sites in it’s /affinity.txt file, and defines some coarse relationship. Something like this:

widgetfactory.com/affinity.txt

www2.widgetfactory.com self
*.widgetwiki.org follow
*.widgetassociation.net follow

So we’ve got the “self” tag that basically says “this is another version, or a related version, of me.” If www2.widgetfactory.com/affinity.txt has a “widgetfactory.com self” entry, you’ve got yourself a verified relationship. We’ve also got something like a follow tag that says “we like these sites and think they’re valuable, you should go there (and follow links from us).” It’s basically a vote. Unidirectional votes are useful for quality, and mutual votes are a big clue about transferring trust.

How many votes do you get? No idea, I don’t see any reason to limit it. I think those things will work themselves out. I don’t see regular sites managing thousands of links in there, I think they just link to enough to make it meaningful. Or maybe there is some hard limit, so there’s less of a guessing game on how to pick who goes there. 100 per site? 1000?

Now you might think, “this is just pagerank” but it actually is different I think. First, it’s much easier to spot foul play. There are far fewer domains than pages, so the graph is much smaller and easier to traverse. Junk sites are going to stick out like a sore thumb. A spammer can’t really use this channel by making artificial networks of trust because it would be so easy to kill them en masse. It’s also difficult to mask or overwhelm like linkfarms.

Does this hurt the little guy? Not any more than spammers, I think. I think even though the expression of the data is simple, the interpretation of it can be very complex. If you’ve got a little blog that just blathers on about computer stuff and doesn’t even have any ads on it, you may not need much trust to rise up on specific content searches. If you’ve got a site with 3 million pages that look an awful lot like wikipedia pages, and your only votes are from other similar sites, and your whole cell has no links from the larger graph, well, maybe, despite your flawless, compact markup and impeccable word variances, you’re not really adding much value, are you?

I’m not saying this particular idea solves all of our problems,, heck, I’m sure there are some problems with it as described, but I think approaches like this will not only affect the quality of our search results, but they will ultimately affect the quality of the web overall.