One of the better arguments for RSS 1.0 over other syndication formats is the claim that the (meta) data plugs directly into the greater Semantic Web, thus making it possible to go both back and forth between the two, making them one. Unfortunately, most aggregators don’t really aggregate, at most they just present a cached version of what’s currently offered, resulting in a disconnect, as Bob DuCharme recently pointed out on rdf-interest (eventually leading to rdfdata.org).
However, archiving “items” from RSS feeds over time presents a few issues.
- Not all RSS items have their own globally unique identifier
- Some RSS feeds are “linkrolls” more than a list of recently created or updated resources. A linkroll references other resources directly, sometimes making incorrect statements about e.g. the creator or time of publication (example: del.icio.us/mortenf). Reliable identification is needed to be able to recognise items that are new, old or updated.
- The ambigious definition of a channel
- In the RSS 1.0 spec it says the following about the rdf:about attribute on the channel element:
Most commonly, this is either the URL of the homepage being described or a URL where the RSS file can be found. The right choice seems to always be the channel URI, the source of the statements, as that is what is commonly referred to by
rdfs:seeAlso in e.g. blogrolls and personal FOAF files, and most often as the identifier used for provenance in a triple store.
- Each item is associated with one or more channels through the
rss:items property, referencing a sequence of the “current” items. The sequence of items is determined through the use of the RDF/XML syntactic construct
rdf:li, which is expanded to
rdf:_2, and so on, in the RDF model. When a new item is added to a channel, it is added at the first position,
rdf:_1, the existing items shift towards the end of the sequence, and the last item disappears from the sequence. In a naÃ¯ve implementation, archiving a channel over time would lead to a “sequence” with each item being referenced more than once, and loss of actual temporal information — it’d be impossible to determine the actual order in which the items appeared. Note also, that in an even more naÃ¯ve implementation (one that doesn’t recognise that the two sequences should be seen as one), the result wouldn’t be an “invalid” sequence, but instead a channel with multiple
rss:items properties, each with a perfectly fine sequence.
Continue reading Aggregating and Archiving RSS Items
A while ago I hacked WordPres into emitting FOAF, but even though it worked fine, the amount of WordPress-tweaking wasn’t for the faint of heart.
Since then, I have looked a little deeper into WordPress, and now it’s finally ready: The FOAF Output Plugin (view source).
Note: If you want to try running this with WordPress > 1.2, please fix the .htaccess rewrite rule for the author pages like this, otherwise the FOAF file will result in a 404:
[^/]+)/?$ /<SITEHOME>/index.php?author_name=$1 [QSA,L]
The current version is 1.17 (released 2005-08-31).
- Changes since 1.16:
- Fixed errant line breaks in links.
- Added generator comment to RSS output.
- Changes since 1.15:
- Added oneline bio on profile page, not HTML-escaped.
- Added link to RSS channel from author list.
- Tweaked prefix usage for namespaces.
- Changes since 1.14:
- Added check for array being returned from get_the_category().
- Added check for get_Lat and get_Long for > 1.2 compatibility.
- Changes since 1.13:
- Fixed category/interest when no categories were found.
- Changed skos:externalID to dc:identifier.
- Updated SKOS generation with SKOS extensions vocabulary.
- Fixed generation of author list URI.
- Changes since 1.12:
- Changed a wrong foaf:made to foaf:page, caught by Ian Davis.
- Fixed erroneous output of homepage URI on profile page.
- Fixed a problem with statements being added to Atom feeds, thanks Danny/Sam.
- Changes since 1.11:
- Added blog-wide FOAF output (blogroll) with seeAlso’s to authors’ individual files (example).
- Changes since 1.10:
- Added SKOS output (example) and enhanced RSS output.
- Added document level RDF/XML API hook, foaf_output_profile_rdf_document, to allow for additional properties by add-ons, e.g. generator information.
- Fixed possible missing namespace declarations for dcterms in RSS/Atom.
- Tweaked initialisation code to increase reusability.
- Changes since 1.9:
- Fixed limited interest generation for HTML profile page.
- Multiple URIs per interest is now handled correctly (if separated by whitespace).
- Only categories with posts by author are deemed “interesting”.
- Added bio:olb per B.K. DeLong’s suggestion.
- Added trust ratings for friends.
- Changes since 1.8:
get_foaf_output_profile_page to better reflect the functionality.
- Changes since 1.7:
- Improved identification of “active” author / user.
- Now really only shows profile on first archive page, even in paged mode.
Continue reading WordPress Plugin: FOAF Output
As hinted to last week, I now have my WordPress installation output comments as RSS 1.0, both as a blog-wide feed and as a feed per post.
At the core they are just like other feeds, but comments are a little more diverse than posts. There are regular comments, entered in the comment form for each post, and PingBack‘s and TrackBack‘s, both of which are “remote” — a notification of reference from somewhere else.
Each of these three types of comments are different from each other, and need to be handled differently.
Continue reading Semantic Comments Feeds from WordPress