Digg: Not so Cool URIs

They may look cool on the surface, the URIs at Digg, but underneath the surface they are quite a lot more unfriendly. Not that this is really a surprise though, Digg doesn’t exactly shine when it comes to being web-friendly.

Exhibit A:

http://digg.com/tech_news/New_Interview_with_Kati_Kim_describes_harrowing_week_lost_in_woods

Earlier today, that new URI showed up in my aggregator. I opened the link in another tab of my browser, and was soon greeted by a rather disappointing message, including the word “Oops” and words to the effect of it being a symptom of a missing page.

Searching Digg for the story in general terms brought up the usual Digg dupes, but also what appeared to be the story I has been looking for, only with a different URI.

Exhibit B:

http://digg.com/world_news/New_Interview_with_Kati_Kim_describes_harrowing_week_lost_in_woods

Compare this with exhibit A, and you’ll find that the post title hasn’t changed, but what has changed is apparently the categorization of the post — instead of being filed under “Tech News”, it’s now filed under “World News”. That little change makes a big difference. While the change of category assignment doesn’t itself offer problems, it evens seems reasonable, the change of URI certainly does. It is obviously the same resource, so to change its identifier is a no-go.

But wait. It gets worse.

Exhibit C:
Oops!

You’d think they would at least then redirect from the “old” URI to the new, making this entire post moot, and offering a pleasant user experience, but no. Instead we were shown a generic error page, offering no assistance in finding what we were looking for. At the very least they could have included a link that would make it easier to find the story in question.

But wait. There’s more…

Exhibit D:

http://digg.com/error

That’s the URI we end up with, showing the message in exhibit C, when following the link in exhibit A — a URI that may look cool, but really isn’t.

Usually, when you follow a link that is invalid for some reason, you get to keep the URI of what you were looking for in the address bar. As in this case, when you are presented with a less than helpful error message, you really could have used the URI for just a hint of what to do next.

But wait. It gets even worse.

I saw the original URI in my aggregator, because a lot of people had dugg the story. Some have likely submitted the URI to their blogs as well, making the URI quite widespread, not only directly via Digg’s RSS feed, as in my case. This means that robots will go look for the resource as well, and they will of course get presented with yet another view.

Exhibit E:

$ HEAD -S http://digg.com/tech_news/New_Interview_with_Kati_Kim_describes_harrowing_week_lost_in_woods
HEAD http://digg.com/tech_news/New_Interview_with_Kati_Kim_describes_harrowing_week_lost_in_woods --> 302 Found
HEAD http://digg.com/error --> 200 OK

Not a 302 to the new URI, not a 404 indicating that it was missing for some reason, not a 410 telling it was gone, but a 200 OK — an error page…

Sigh.

It’s a good thing Digg doesn’t offer tagging, or their URIs would be changing by the minute, exploding the number of published URIs and disrupting the space-time continuum…

QOTD

Nothing has ever been said just once. (By the way, there are no google hits on that phrase.)

David Weinberger

The above quote was dug up while trying to find the origin of another quote:

On the web, everyone is famous to 15 people.

Apparently, noone knows exactly who said that one first:

Appropriately enough, many people share authorship of that one.

Andy Was Right, TIME (from the future!)

Modified or not?

According to Sam Ruby, WordPress (among others) isn’t responding as expected to being sent If-None-Match and/or If-Modified-Since HTTP headers.

I’ve tried replicating the experiment with WordPress 2.0.2, 2.0.2 and 2.0.4 — they all yield the same results:

> HEAD http://planet.sfit.dk/feed/
...
ETag: "1a0c3e00da9d1d0a6e145168720f8574"
Last-Modified: Thu, 23 Nov 2006 01:20:07 GMT
...
> HEAD -H 'If-Modified-Since: Thu, 23 Nov 2006 01:20:07 GMT' http://planet.sfit.dk/feed/
...
Status: 304 Not Modified
> HEAD -H 'If-None-Match: "1a0c3e00da9d1d0a6e145168720f8574"' http://planet.sfit.dk/feed/
...
Status: 304 Not Modified
> HEAD -H 'If-Modified-Since: Thu, 23 Nov 2006 01:20:07 GMT' -H 'If-None-Match: "1a0c3e00da9d1d0a6e145168720f8574"' http://planet.sfit.dk/feed/
...
Status: 304 Not Modified
> HEAD -H 'If-Modified-Since: Thu, 23 Nov 2005 01:20:07 GMT' -H 'If-None-Match: "1a0c3e00da9d1d0a6e145168720f8574"' http://planet.sfit.dk/feed/
...
Status: 200 OK
> HEAD -H 'If-Modified-Since: Thu, 23 Nov 2006 01:20:07 GMT' -H 'If-None-Match: "1a0c3e00da9d1d0a6e145168720f8579"' http://planet.sfit.dk/feed/
...
Status: 200 OK

This seems to be working fine, or at least according to plan.

I originally suspected the reason Sam and others were getting it wrong was the use of double quotes in the ETag value, but on the other hand I can reproduce the problem with the feed from webstandards (apparently running version 2.0.2), so it seems the problem might be related to something specific to some sites, not to WordPress itself.