affinity.txt

SEO sucks. It’s a fun little game to play for a while, but at the end of the day almost everyone loses. The searchers lose because they can’t find the best stuff any more. The search engines lose because their searchers see worse results and are less happy. Legitimate creators and business lose because they have traffic siphoned off by spammers and scrapers. They’re forced to waste brainpower and money on this ridiculous game that, from where I’m standing, they’re losing.

I’m not pretending that there was some golden age without spammers, they’ve always been there, and they always will be. Originally we had straight-up content matches, then keywords. Search quality was approaching unusability before Google came on the scene. They did a good job, PageRank put a trust network in the mix, and worked great for a while. Eventually that well was poisoned too. They’ve done many things since, and I bet many of them helped. The new site blacklist is a good step but certainly a tricky one to use. The rise of Bing is actually a good thing, I think Google is in a much better position to take risks and change things when they’re at 70 or 80% of the market as opposed to 95%.

So let’s stop talking about SEO as a black art. Let’s stop making our sites worse to prolong a losing battle. Let’s take that energy and put it into something else. Put your content out there in the best format you can. Forget code-to-content ratios and maximizing internal link structures that don’t benefit your users.

Instead, let’s think of ways that we can help explicitly affect results. Here’s one: Think robots.txt meets social graph meets PageRank. site.com lists other sites in it’s /affinity.txt file, and defines some coarse relationship. Something like this:

widgetfactory.com/affinity.txt

www2.widgetfactory.com self
*.widgetwiki.org follow
*.widgetassociation.net follow

So we’ve got the “self” tag that basically says “this is another version, or a related version, of me.” If www2.widgetfactory.com/affinity.txt has a “widgetfactory.com self” entry, you’ve got yourself a verified relationship. We’ve also got something like a follow tag that says “we like these sites and think they’re valuable, you should go there (and follow links from us).” It’s basically a vote. Unidirectional votes are useful for quality, and mutual votes are a big clue about transferring trust.

How many votes do you get? No idea, I don’t see any reason to limit it. I think those things will work themselves out. I don’t see regular sites managing thousands of links in there, I think they just link to enough to make it meaningful. Or maybe there is some hard limit, so there’s less of a guessing game on how to pick who goes there. 100 per site? 1000?

Now you might think, “this is just pagerank” but it actually is different I think. First, it’s much easier to spot foul play. There are far fewer domains than pages, so the graph is much smaller and easier to traverse. Junk sites are going to stick out like a sore thumb. A spammer can’t really use this channel by making artificial networks of trust because it would be so easy to kill them en masse. It’s also difficult to mask or overwhelm like linkfarms.

Does this hurt the little guy? Not any more than spammers, I think. I think even though the expression of the data is simple, the interpretation of it can be very complex. If you’ve got a little blog that just blathers on about computer stuff and doesn’t even have any ads on it, you may not need much trust to rise up on specific content searches. If you’ve got a site with 3 million pages that look an awful lot like wikipedia pages, and your only votes are from other similar sites, and your whole cell has no links from the larger graph, well, maybe, despite your flawless, compact markup and impeccable word variances, you’re not really adding much value, are you?

I’m not saying this particular idea solves all of our problems,, heck, I’m sure there are some problems with it as described, but I think approaches like this will not only affect the quality of our search results, but they will ultimately affect the quality of the web overall.

So THAT’s What a Debate Is

This is a politics-free blog, but this needs to be said. I’ve watched most of the debates since the laughable Clinton/Dole ones, through the confusing Bush/Gore ones, and through the pitiful Bush/Kerry ones, and that was the first real presidential debate I’ve ever seen. Two smart guys going at it (after a bit of prodding by Jim Awesomepants Lehrer), minimal bald-faced-lies, presenting real differences of beliefs and leadership. Maybe this whole democracy thing actually works…

Voting Obsecurity

December 26, 2006

The Honorable George W. Bush
President of the United States
The White House
1600 Pennsylvania Avenue NW
Washington, DC 20500

Dear Mr. President,

I hope the holidays find you well, and wish you happy and productive new year.

I am not an expert on the topic of voting, nor am I a professional security expert, but I do try and follow the news with regards to the intersection of the two. I watched the HBO documentary “Hacking Democracy” this weekend while wrapping presents, and was flat-out disgusted. It was not especially well-made, made no attempt at being objective, nor was most of its information news to me, but I was moved when I watched a mock election be rigged using actual voting systems and saw a real, respected election official be speechless and dumbfounded that his job was completely undermined.

There is no simple solution to secure voting, nor is it remotely probable that any election will ever be 100.0% honest, but there are some monumentally obvious flaws in the way we currently count votes. The largest is the issue of the lack of openness of the software, systems, and processes that are involved.

I am a software developer and most non-programmers I’ve talked to have a difficult time understanding the idea that public access to the “secret codes” of software, AKA the source code, is more secure than private or closed source code. The general opinion is that if you can see the inner workings of something, it’s easier to break it, which is valid and true. The next step in this thought process, however, is more critical and is one that most people don’t take. That is that if you cannot see the inner workings of something that is broken, it’s more difficult to fix it.

The idea of “security by obscurity”, sometimes referred to as “obsecurity”, is valid and necessary when it comes to information such as private financial data, personal information like medical histories, and intelligence gathered by our military and other government agencies. However, regarding mechanisms and processes, such as software, obscurity lessens security. This is doubly true when those mechanisms are designed to collect and analyze public data, such as votes.

Here’s a dirty secret of the programming craft: 99.9999% of software is broken. By broken I mean there is some bug somewhere in it. In new or rarely used software these bugs can be serious and misleading. In most mature software, it’s nothing serious; something doesn’t display right, some obscure error condition is handled poorly, etc. This applies to video games, email programs, ATM software, Windows, Linux, etc. as well as voting software.

So how do you weed out these bugs? You test, over and over and over again. When you find a bug, you test again, even things that you didn’t fix (AKA regression testing). Eventually, you’ve fixed all or most of the bugs that were found, satisfied your unit tests and requirements, and you ship it. Then your customer does something you never planned on, maybe because they are being silly or stupid, or you aren’t the programming god you thought you were, or your QA staff is overworked, or it just wasn’t possible for you to test in the lab. This is why you get Windows updates every week, and why there are dozens of bug fixes for every Linux kernel, patches for every video game, it’s simply unavoidable. The more something is used, the more bugs are found, and the better it becomes.

Voting software isn’t used very much. Most machines are used one day every year or two. Excel has been used by millions of people every day for nearly 20 years, and there are still bugs in it. If I’m making an inventory of my baseball cards and have a problem with Excel I can report it to Microsoft and hopefully a ticket will be opened and hopefully they will fix it. The difference between a spreadsheet package and a voting system is that my grandfathers didn’t risk their lives overseas to make sure “=SUM(D3:D13)” was accurate, they did it so that I would grow up in a better world than they did, and just as importantly, have the power to make it better for my grandkids.

It is absolutely imperative that we apply the highest possible standards of scrutiny, security, and integrity to the systems that facilitate our most sacred public right. Voting software, hardware, and system should not only be open, they should be the zenith of openness. The public should be able to download complete specifications for every piece of hardware on every type of voting machine out there, from the device I vote on to the system that tabulates it to the printers that make the reports. We should have access to every line of code used in the entire process. I should be able to test it myself and find flaws or solicit advice from those I trust. The public should have as much access to the hardware as is feasible. Regular citizens, universities and vendors should be encouraged with bounties to find and report flaws they find. Defect reports on voting systems should be legal documents, also open to review by all. There should be digitally signed video publicly viewable via live broadcast or within hours of all access to every machine with the sole exception of the person casting a ballot. There is room for only ONE secret in this entire process, and that is who an individual voted for.

I would be exceptionally pleased if you would propose or support legislation to help protect our votes by virtue of an open and honest process. It would not only validate the sacrifices millions of Americans have already made, but it would set an example that other democracies and future generations will aspire to.

Sincerely,

Eric F. Savage


Also sent to my Senators Kennedy and Kerry, Representative Frank, Governor-Elect Patrick, State Senators Brown and Creem and Secretary of the Commonwealth Galvin. If you feel similarly I encourage you do write to your officials and feel free to borrow or copy from my letter.

For more, better information on this topic I recommend checking out Ben Adida’s Blog and Black Box Voting. A web search for ‘secure voting’ or similar topics will also turn up piles of other opinions and (often scary) facts.