Transforming RDF/XML with XSLT

A couple of months ago, while working on a project that will hopefully see the light of day soon, I realised I needed terms for singular and plural labels for properties and classes. Even with the help of SchemaWeb I couldn’t find existing terms, so I decided to cook my own, resulting in the label vocabulary with two properties:

plural
A relation between a term and its label in literal plural form.
singular
A relation between a term and its label in literal singular form.

This was not the only vocabulary I was working on at that moment, and I needed to be able to get an overview, a human-readable version. Last year I did the RDFS Explorer for basically the same purpose, but since I was entering OWL territory, it wasn’t really up to the task. Back to square one.

Transforming RDF serialised as XML into HTML sounds like a great job for XSLT, but the possible syntax variations of RDF/XML makes it next to impossible to handle arbitrary RDF/XML found in the wild. For the FOAF Explorer and my photo collection I’ve resorted to “normalizing” by parsing and reserialising with Redland by Dave Beckett prior to the actual transformation, and that has turned out to work just fine. Redland’s serialiser emits RDF/XML using only a subset of the productions in the RDF/XML Syntax Specification, a subset that leaves out some otherwise handy shortcuts in exchange for predictability.

The only problem with the output from Redland is the fact that statements aren’t grouped by subject, which would make it much easier to handle with XSLT. For the FOAF Explorer this is handled in PHP along with the serialisation, which doesn’t use Redland’s internal serialiser, but for the general case I’ve developed an XSLT that does the trick: rdf2r3x.xsl v1.0, which produces a syntax profile I hereby dub R3X for Redland Restricted RDF/XML.

In addition to grouping by subject, the rdf2r3x XSLT also handles a few more tasks that makes it possible to publish the output right away (even if it’s not pretty):

  • The output for an RSS channel conforms to the defined syntax profile, including the one for the content module.
  • rss:title and rss:description elements are added for RSS classes, if not already present, based on dc:title and dc:description.
  • dc:date elements are added when dcterms:created, dcterms:modified, and dcterms:issued are encountered and there isn’t already a dc:date element present with the same value.
  • The RDF properties rdf:_1, rdf:_2, etc. are converted to rdf:li.

Examples:

  1. Raw input (a copy of the RSS channel from this blog)
  2. Output from Redland, input to rdf2r3x.xsl
  3. Output from rdf2r3x.xsl

Now, with a predictable RDF/XML document, it’s time to move on to the actual presentation of the vocabulary. Obtaining an HTML representation of the document is really not that hard anymore, but presenting the OWL constructs, e.g. subclass relationships, in a sensible manner is not easy.

Thus, the current version, owl2html.xsl v0.1, only handles a very limited subset of OWL, most notably classes, properties, subclass relationships (partially), domains and restrictions (partially). Also, it depends on meta.xsl v0.3, a set of generic rules for presenting RDF/XML as HTML.

Examples:

  1. Raw input (the source label vocabulary)
  2. Intermediate output (from rdf2r3x.xsl)
  3. Output from owl2html.xsl

These transformations have been used for a while, but are also still under development, especially the vocabulary transformation. I’ll post updates when major changes warrant it.

As can be seen in all three of these XSLTs, they make heavy use of the XSLT key construct. This should speed up the transformation considerably, but it seems that the different XSLT processor implementations handle them differently – in my tests xsltproc from libxslt was considerably faster than sabcmd from Sablotron. The last is the default XSLT processor available in PHP 4.x, while the former is what’s going into PHP 5.x – even though it’ll be a pain upgrading all my PHP to 5.x it seems it’ll be worth it.

8 thoughts on “Transforming RDF/XML with XSLT

  1. A lot of work!

    Unfortunately I can’t tell you how good the final output looks, the link “Output from owl2html.xsl” is 404ing.

  2. Reto,

    In general, bnodes are serialized at the outmost level, just like for the “regular” resources, with rdf:nodeID being used for cross references.

    However, to conform to the syntax profile of RSS, a few special cases are handled differently, namely the rdf:Seq in rss:items and the rdf:Bag and content:item for the content module.

    In the XSLT this is handled by skipping the normal output for these three cases (line 64-74 in rdf2r3x.xsl), and then constructing the special nesting when content:items and rss:items are serialized (line 141-167).

  3. Nice work Morten!

    We have been looking at simple “normalized” RDF/XML (the one of Redland, others or RDF/XML-ABBREV in Jena i.e. group statements by-subject) for long time now to build a simple pure XML/XSLT based RDF/XML editor – and works pretty well. Even though, last time I talked with Jeremy Caroll about using “normalized” RDF/XML instead of TriX in Jena he told me that “RDF/XML-ABBREV” does not work when predicates QNames do not have a ‘splitting’ point (a no XML NChar) between localname and namespace parts i.e. in jena at least, it seems impossible to serialize those models. But if you make your RDF storage aware of the problem, it is possible to store and keep distinct the (namespace,localname) predicate pairs and you are done :-) but they did not get it yet it seems :)

    Do you have any experience on the “splitting” problem on predicates?

  4. Thanks Alberto!

    The QName splitting problem is a known issue with RDF/XML, but I have yet to run into the problem. It seems to me that this issue will likely first roar its ugly head when OWL and the likes are applied (and even in those cases it could possibly be avoided), otherwise it’s basically “plain” vocabulary usage that’s (almost) designed to handle it.

    On the storage awareness part, from what I’ve gathered at least, it seems that keeping the namespace URI and local name separate has a performance penalty. Of course, if it’s otherwise a no-go situation it can be necessary, but until I run into the problem…

  5. I take it!

    I’ll try to write an R3X serializer for jena,I’ll probably ommit the rss-stuff since the conversion can be done from R3X to RSS with XSLT. I’ll let you know when I’m there.

Comments are closed.