Category Archives: XML

http://www.w3.org/XML/

Exchange of Named RDF Graphs

UPDATE: This implementation has been updated, please see Named Graph Exchange.

Every now and then I’ve run into the need for transporting an RDF graph between triple stores. I use Redland/MySQL with contexts to store information about the origin of each triple, so up until now the only way has been to transfer the triples directly from one database to another. This is because triples are just that, triples, not quads, and RDF itself only provides reification as a way out, not a very attractive option for space and performance reasons.

There have been other approaches to dealing with graph naming in RDF, TriG is one, N3/cwm has another — here’s yet another way: Wrapping up the graphs not in a single document, but in a zip archive with an index mapping documents to names.

It may seem unwise to seemingly try to circumvent real provenance issues by “just” naming graphs, but this is only intended for exchange between trusted parties, it’s not a format that’s expected to be found and consumed as other RDF documents found on the Web.

Continue reading Exchange of Named RDF Graphs

Dynamic Tabs for Metadata

I generate and store quite a lot of metadata with my photos, as can be gathered from my faceted photo index. Until now, I have simply displayed most of it beneath each photo on its page, but I wanted to make the interesting parts stand out more, while still providing access to the rest.

CSS and JavaScript to the rescue.

Simon Willison created a small script for toggling sections of page, easytoggle and debugging in Safari, which was subsequently improved to also handle Safari. That seemed like a great way to approach the problem — making it possible to structure the information, while still leaving it accessible to all.

However, there were (of course) a few quirks with that implementation, so I added a few lines of code to make it possible to not display the menu tabs when JavaScript isn’t enabled, and to make the inactive tabs dim until selected: easytoggle3.js

To designate a section as the tab section (to make it not show up without javascript), identify it with #toggle, and add a CSS instruction to make it not display: display: none. The rest of the script works just as the original, where links with class="toggle" are used to identify the parts that should be togglable.

Redesign, Retrospective and Resolutions

At some point this fall, I promised myself I’d refactor my web pages, to give them all a similar look, while making it easy to update that look in the future, and drive most of the content with RDF — after all, web pages are resources.

I’m not quite done with all the corners, most notably my homepage, but at least now the weblog and the photo albums share a common stylesheet, with everything in place for tweaking the rest, including a Planet Morten feed!

For the coming year, I intend to continue my switch of focus from producing RDF to consuming it. I have started out by generating a faceted interface for my photos (which could use an additional interface like libby’s calendar view), and with Leigh Dodds releasing Slug: A Simple Semantic Web Crawler, I’m reminded to get back to work with my scutter, Scutter Strategies and the Scutter Vocabulary. Also, Bob DuCharme has created rdfdata.org, which means that it’s now easier than ever to find data to play around with. Integral to most of this is me getting around to writing/porting the RDQL/SPARQL rewriting code to the Redland/MySQL storage backend.

To see what it’s like, I also intend to start a “real” (Danish) weblog, one that is updated on an (almost) daily basis, I think it’ll be good for me to get into the habit of writing more often than now, where most of the stuff I do sits quietly behind the scenes, waiting for that elusive moment when there’s time to refactor and document it properly. In short: Moving to a state of mind where (seemingly!) perfect is an option, not a requirement — a state I’m finding it hard to get to, but also a state from which I have learned a lot from others in the Open Source community.

So much to do, so little time, but I think it’s important to showcase how RDF can actually be used, not just produced, all the while making interesting stuff simpler.

Italian XML

You know you might be hungry when you are greeted by the following error message:

xsltStylePreCompute: unknown xsl:parma
xsltApplyOneTemplate: parma was not compiled

The offending part:

<xsl:template mode="navigation" priority="0.2" match="*[@rdf:resource]"> 
  <xsl:parma name="head" select="false()"/> 
  <div class="navigation"> 
    <xsl:variable name="this-p" select="concat(namespace-uri(),local-name())"/>

To keep this sort of on-topic: There was a workshop in Italy last week, SWAP 2004, Danny Ayers has more.

Aggregating and Archiving RSS Items

One of the better arguments for RSS 1.0 over other syndication formats is the claim that the (meta) data plugs directly into the greater Semantic Web, thus making it possible to go both back and forth between the two, making them one. Unfortunately, most aggregators don’t really aggregate, at most they just present a cached version of what’s currently offered, resulting in a disconnect, as Bob DuCharme recently pointed out on rdf-interest (eventually leading to rdfdata.org).

However, archiving “items” from RSS feeds over time presents a few issues.

Not all RSS items have their own globally unique identifier
Some RSS feeds are “linkrolls” more than a list of recently created or updated resources. A linkroll references other resources directly, sometimes making incorrect statements about e.g. the creator or time of publication (example: del.icio.us/mortenf). Reliable identification is needed to be able to recognise items that are new, old or updated.
The ambigious definition of a channel
In the RSS 1.0 spec it says the following about the rdf:about attribute on the channel element: Most commonly, this is either the URL of the homepage being described or a URL where the RSS file can be found. The right choice seems to always be the channel URI, the source of the statements, as that is what is commonly referred to by rdfs:seeAlso in e.g. blogrolls and personal FOAF files, and most often as the identifier used for provenance in a triple store.
The rss:items/rdf:Seq construct
Each item is associated with one or more channels through the rss:items property, referencing a sequence of the “current” items. The sequence of items is determined through the use of the RDF/XML syntactic construct rdf:li, which is expanded to rdf:_1, rdf:_2, and so on, in the RDF model. When a new item is added to a channel, it is added at the first position, rdf:_1, the existing items shift towards the end of the sequence, and the last item disappears from the sequence. In a naïve implementation, archiving a channel over time would lead to a “sequence” with each item being referenced more than once, and loss of actual temporal information — it’d be impossible to determine the actual order in which the items appeared. Note also, that in an even more naïve implementation (one that doesn’t recognise that the two sequences should be seen as one), the result wouldn’t be an “invalid” sequence, but instead a channel with multiple rss:items properties, each with a perfectly fine sequence.

Continue reading Aggregating and Archiving RSS Items