Over the last week, Planet RDF has seen more than a few posts and comments on the RDF/XML serialisation syntax, most of them looking into its (almost not enumerable) possible variations.
Danny Ayers has a great overview with reference to the expectations, not an unimportant point — as my boss always tells me: It’s all about controlling expectations.
All in all, I agree with most of the comments on the subject, even parts of the ones that don’t seem to agree with each other.
In short: RDF/XML is just a syntax (it’s the RDF model that counts), and while I generally find it acceptable, the variation is one aspect I hope I would have done different had I been involved. That would make it more accessible to XML tools such as XSLT, leading to easier ways of generating clean XML for other uses.
The subject of RDF/XML variation is as close to being a permathread as it can be, and I’ve participated before myself, mostly with regard to the R3X syntax subset. I’ve been doing a lot of XSLT as well, and three recommendations come to mind when considering a syntactic profile or subset of RDF/XML, to reduce the variation:
- Drop the attribute form
- Except for aesthetic reasons, it’s not necessary.
- Don’t use typed nodes
- While it may seem easier and smarter to write
<rdf:Description><rdf:type rdf:resource="&foaf;Person"/>...</rdf:Description>, it makes it much harder to deal with nodes that have multiple types (and using named entities can help a lot too).
- Sort and group
- ‘nuf said — don’t break statements about the same subject into different elements, keep them together and don’t nest at all.
That said, I don’t consider the deficiencies of RDF/XML to be serious enough to warrant a new XML syntax — after all, there are plenty of RDF/XML parsers out there by now, and the real challenges lies elsewhere (see: Crisis, LargeTripleStores, and Tagtriples + identity precision).
To help develop and test my new Sparqlette service, I hacked a couple of XSLTs that might come in handy here and there…
- SPARQL to RSS (lastest version: 0.3)
- As its name implies, this XSLT turns a SPARQL Query Results XML Format document (Variable Binding Results) into an RSS channel, making it possible to subscribe to the results of an (almost) standard SPARQL query without using
CONSTRUCT. As can be expected, not all query results work, as the RSS specification mandates certain elements. Thus, the value of the channel’s rss:link property is taken from an XSLT parameter named
_uri, the variable bindings to use for rss:link and rss:title in each item is determined via some crude heuristics, and only items that have a URI for the chosen rss:link binding are created.
Variable selection heuristics for rss:link / rss:title:
- If there’s a variable named
rsstitle respectively, the bindings for that variable is used for all items.
- Otherwise, the first variable that has a binding to a URI is used for rss:link, and the first variable that has a binding to a literal is used for rss:title
For rss:link I wanted to add another option between the two, that would locate a variable that only has bindings to URIs, but I couldn’t get it working with a single XPath expression, so I gave up.
Example RSS (view Sparqlette input parameters).
- SPARQL to SPARQL (latest version: 0.1)
- This XSLT simply converts documents in the syntax of any of the currently two SPARQL Query Results XML Format draft specifications, W3C Working Draft 21 December 2004 and $Revision: 1.29 $ of $Date: 2005/05/03 09:58:04 $, into the syntax of the latest version, currently the latter. I promise to do my best to stay up to date…
Note: This entry — as all entries in the Release category — will serve as a changelog (you can subscribe to its RSS feed if you want to make sure you don’t miss out on any updates).
Things have been a bit hectic lately, but I have actually managed to make something that might be worth going public with. (Actually, I’ve already pointed it out on #swig, but that’s another matter.)
Over at FilmTrust they let you not only rate and review movies, but also connect with your friends to see what kind of movies they like. Of course, this being a service from the first site on the Semantic Web, it offers a nice FOAF document, like mine.
As you can see, it contains information about the reviews I’ve written and a list of my friends, with rdfs:seeAlso‘s provided for the latter, which makes it possible to create an RSS feed with some XSLT and use of the document() function, like this: FilmTrust reviews by mortenf and friends. The output is generated via W3C’s XSLT Service — note how at least three URIs are involved in this, that’s (minimal) REST for you. Oh, let’s add another one: Via the Syndication Subscription Service.
A nice addition to the original source FOAF would be dates on the reviews — that’d make it possible to limit the size of the resulting RSS file. As it is, there’s no way to know which are “new”. Also, there are some escaping issues on FilmTrust, I had to remove golbeck and sbp from my friend list to get a running example…
Note that the XSLT takes an optional parameter, user-only. That’s provided in case you’re only interested in your own reviews — I use this to drop them into my personal planet feed.
Try subscribing to reviews from your own social network!
You know you might be hungry when you are greeted by the following error message:
xsltStylePreCompute: unknown xsl:parma
xsltApplyOneTemplate: parma was not compiled
The offending part:
<xsl:template mode="navigation" priority="0.2" match="*[@rdf:resource]">
<xsl:parma name="head" select="false()"/>
<xsl:variable name="this-p" select="concat(namespace-uri(),local-name())"/>
To keep this sort of on-topic: There was a workshop in Italy last week, SWAP 2004, Danny Ayers has more.
One of the better arguments for RSS 1.0 over other syndication formats is the claim that the (meta) data plugs directly into the greater Semantic Web, thus making it possible to go both back and forth between the two, making them one. Unfortunately, most aggregators don’t really aggregate, at most they just present a cached version of what’s currently offered, resulting in a disconnect, as Bob DuCharme recently pointed out on rdf-interest (eventually leading to rdfdata.org).
However, archiving “items” from RSS feeds over time presents a few issues.
- Not all RSS items have their own globally unique identifier
- Some RSS feeds are “linkrolls” more than a list of recently created or updated resources. A linkroll references other resources directly, sometimes making incorrect statements about e.g. the creator or time of publication (example: del.icio.us/mortenf). Reliable identification is needed to be able to recognise items that are new, old or updated.
- The ambigious definition of a channel
- In the RSS 1.0 spec it says the following about the rdf:about attribute on the channel element:
Most commonly, this is either the URL of the homepage being described or a URL where the RSS file can be found. The right choice seems to always be the channel URI, the source of the statements, as that is what is commonly referred to by
rdfs:seeAlso in e.g. blogrolls and personal FOAF files, and most often as the identifier used for provenance in a triple store.
- Each item is associated with one or more channels through the
rss:items property, referencing a sequence of the “current” items. The sequence of items is determined through the use of the RDF/XML syntactic construct
rdf:li, which is expanded to
rdf:_2, and so on, in the RDF model. When a new item is added to a channel, it is added at the first position,
rdf:_1, the existing items shift towards the end of the sequence, and the last item disappears from the sequence. In a naÃ¯ve implementation, archiving a channel over time would lead to a “sequence” with each item being referenced more than once, and loss of actual temporal information — it’d be impossible to determine the actual order in which the items appeared. Note also, that in an even more naÃ¯ve implementation (one that doesn’t recognise that the two sequences should be seen as one), the result wouldn’t be an “invalid” sequence, but instead a channel with multiple
rss:items properties, each with a perfectly fine sequence.
Continue reading Aggregating and Archiving RSS Items