A couple of months ago, while working on a project that will hopefully see the light of day soon, I realised I needed terms for singular and plural labels for properties and classes. Even with the help of SchemaWeb I couldn’t find existing terms, so I decided to cook my own, resulting in the label vocabulary with two properties:
- plural
- A relation between a term and its label in literal plural form.
- singular
- A relation between a term and its label in literal singular form.
This was not the only vocabulary I was working on at that moment, and I needed to be able to get an overview, a human-readable version. Last year I did the RDFS Explorer for basically the same purpose, but since I was entering OWL territory, it wasn’t really up to the task. Back to square one.
Transforming RDF serialised as XML into HTML sounds like a great job for XSLT, but the possible syntax variations of RDF/XML makes it next to impossible to handle arbitrary RDF/XML found in the wild. For the FOAF Explorer and my photo collection I’ve resorted to “normalizing” by parsing and reserialising with Redland by Dave Beckett prior to the actual transformation, and that has turned out to work just fine. Redland’s serialiser emits RDF/XML using only a subset of the productions in the RDF/XML Syntax Specification, a subset that leaves out some otherwise handy shortcuts in exchange for predictability.
The only problem with the output from Redland is the fact that statements aren’t grouped by subject, which would make it much easier to handle with XSLT. For the FOAF Explorer this is handled in PHP along with the serialisation, which doesn’t use Redland’s internal serialiser, but for the general case I’ve developed an XSLT that does the trick: rdf2r3x.xsl v1.0, which produces a syntax profile I hereby dub R3X for Redland Restricted RDF/XML.
In addition to grouping by subject, the rdf2r3x XSLT also handles a few more tasks that makes it possible to publish the output right away (even if it’s not pretty):
- The output for an RSS channel conforms to the defined syntax profile, including the one for the content module.
rss:title
andrss:description
elements are added for RSS classes, if not already present, based ondc:title
anddc:description
.dc:date
elements are added whendcterms:created
,dcterms:modified
, anddcterms:issued
are encountered and there isn’t already adc:date
element present with the same value.- The RDF properties
rdf:_1
,rdf:_2
, etc. are converted tordf:li
.
Examples:
- Raw input (a copy of the RSS channel from this blog)
- Output from Redland, input to
rdf2r3x.xsl
- Output from
rdf2r3x.xsl
Now, with a predictable RDF/XML document, it’s time to move on to the actual presentation of the vocabulary. Obtaining an HTML representation of the document is really not that hard anymore, but presenting the OWL constructs, e.g. subclass relationships, in a sensible manner is not easy.
Thus, the current version, owl2html.xsl v0.1, only handles a very limited subset of OWL, most notably classes, properties, subclass relationships (partially), domains and restrictions (partially). Also, it depends on meta.xsl v0.3, a set of generic rules for presenting RDF/XML as HTML.
Examples:
- Raw input (the source label vocabulary)
- Intermediate output (from
rdf2r3x.xsl
) - Output from
owl2html.xsl
These transformations have been used for a while, but are also still under development, especially the vocabulary transformation. I’ll post updates when major changes warrant it.
As can be seen in all three of these XSLTs, they make heavy use of the XSLT key
construct. This should speed up the transformation considerably, but it seems that the different XSLT processor implementations handle them differently – in my tests xsltproc
from libxslt was considerably faster than sabcmd
from Sablotron. The last is the default XSLT processor available in PHP 4.x, while the former is what’s going into PHP 5.x – even though it’ll be a pain upgrading all my PHP to 5.x it seems it’ll be worth it.
A lot of work!
Unfortunately I can’t tell you how good the final output looks, the link “Output from owl2html.xsl” is 404ing.
Thanks Danny, the link is now fixed.
Now I can say how nice it looks, thanks!!
I still don’t get it!
When are bnodes serialized inline (like the Seq in the RSS-example) and are nodeIDs used?
Reto,
In general, bnodes are serialized at the outmost level, just like for the “regular” resources, with
rdf:nodeID
being used for cross references.However, to conform to the syntax profile of RSS, a few special cases are handled differently, namely the
rdf:Seq
inrss:items
and therdf:Bag
andcontent:item
for the content module.In the XSLT this is handled by skipping the normal output for these three cases (line 64-74 in rdf2r3x.xsl), and then constructing the special nesting when
content:items
andrss:items
are serialized (line 141-167).Nice work Morten!
We have been looking at simple “normalized” RDF/XML (the one of Redland, others or RDF/XML-ABBREV in Jena i.e. group statements by-subject) for long time now to build a simple pure XML/XSLT based RDF/XML editor – and works pretty well. Even though, last time I talked with Jeremy Caroll about using “normalized” RDF/XML instead of TriX in Jena he told me that “RDF/XML-ABBREV” does not work when predicates QNames do not have a ‘splitting’ point (a no XML NChar) between localname and namespace parts i.e. in jena at least, it seems impossible to serialize those models. But if you make your RDF storage aware of the problem, it is possible to store and keep distinct the (namespace,localname) predicate pairs and you are done :-) but they did not get it yet it seems :)
Do you have any experience on the “splitting” problem on predicates?
Thanks Alberto!
The QName splitting problem is a known issue with RDF/XML, but I have yet to run into the problem. It seems to me that this issue will likely first roar its ugly head when OWL and the likes are applied (and even in those cases it could possibly be avoided), otherwise it’s basically “plain” vocabulary usage that’s (almost) designed to handle it.
On the storage awareness part, from what I’ve gathered at least, it seems that keeping the namespace URI and local name separate has a performance penalty. Of course, if it’s otherwise a no-go situation it can be necessary, but until I run into the problem…
I take it!
I’ll try to write an R3X serializer for jena,I’ll probably ommit the rss-stuff since the conversion can be done from R3X to RSS with XSLT. I’ll let you know when I’m there.