Category Archives: RDF

http://www.w3.org/RDF/
skos:related http://crschmidt.net/blog/archives/author/crschmidt/skos.rdf#c8 RDF

Sparqling Days

Just like Leigh and Danny — among several other smart people — I’ve been lucky enough to get invited to Sparqling Days, 2 days of likely intense hacking on querying RDF with SPARQL, orchestrated by @semantics.

Unfortunately, several people, among them Leigh whom I have still to meet over beer, can’t make it, but there’s at least one bonus, I finally get to meet Danny Ayers, the man of steel and father of Sparql-the-cat.

I’m sure this will turn out to be quite a nice trip, from what I hear Italy is a very nice country, and the team hacking session will likely help advance the state of RDF query and provide implementations of nice demo applications.

42

According to The Hitchhiker’s Guide to the Galaxy (which is still on my to-read list), “42” is The Answer to Life, the Universe, and Everything. However, I think there may be an exception when it comes to the question about httpRange-14, a long debated issue in web circles (see also http-range-14 no comment).

Earlier today, the Semantic Web Best Practices and Deployment Working Group sent word to the TAG, that it has resolved that an http URI without a hash MAY be used to identify an RDF property. Thus, we may have a partial answer to httpRange-14, one that isn’t “42”, but one that doesn’t answer other questions (not including “Life, the Universe, and Everything”).

I think the conclusion by the SWBPD WG is sound — while in an ideal world it could be useful to know just by looking at a URI whether it identified a document or something else, it isn’t strictly necessary. Sure, it might help a human out now and then, but in the long run, the machines will be doing the analyzing part, and according to the TAG’s finding on what URIs identify, URIs are opaque. It all comes down to semantics assigned by humans.

At the same time, it is often practical and useful to use fragment identifiers for RDF vocabulary terms, so wording that uses “MAY” is perfect in my book — even if it doesn’t answer or solve everything…

Redland Smushing

Some time — actually more than a year — ago i wrote a smusher for Redland that works by rewriting nodes based on identity inference.

To begin with, it handled only IFP’s, owl:InverseFunctionalProperty, but the other day I needed it to be able to handle FP’s, owl:FunctionalProperty, as well.

A classic example of an IFP is foaf:homepage — only one resource can have some specific URI as its homepage, which is handy for identity reasoning across the Web. Just as useful is the somewhat recently added property foaf:primaryTopic, which is an FP — if a page is described in more than one place, each with a seemingly different primary topic, it can be inferred that the two “topics” are actually just one, handy when identifying movies, since almost all movies have a page describing it at the Internet Movie Data Base.

The smusher is written in C, isn’t heavily commented, has been used elsewhere without problems, and works by finding IFP’s and FP’s in the model it is smushing or by being passed a specific property to smush on — a nasty way of testing is to pass it rdf:type

Building it should be somewhat straight forward, but the accompanying Makefile might help out here and there.

Note: This entry — as all entries in the Release category — will serve as a changelog (you can subscribe to its RSS feed if you want to make sure you don’t miss out on any updates).

The current version is 0.21 (released 2005-03-27).

Changes since 0.20:
  • Reworked rewriting process to avoid database deadlocks.

Exchange of Named RDF Graphs

UPDATE: This implementation has been updated, please see Named Graph Exchange.

Every now and then I’ve run into the need for transporting an RDF graph between triple stores. I use Redland/MySQL with contexts to store information about the origin of each triple, so up until now the only way has been to transfer the triples directly from one database to another. This is because triples are just that, triples, not quads, and RDF itself only provides reification as a way out, not a very attractive option for space and performance reasons.

There have been other approaches to dealing with graph naming in RDF, TriG is one, N3/cwm has another — here’s yet another way: Wrapping up the graphs not in a single document, but in a zip archive with an index mapping documents to names.

It may seem unwise to seemingly try to circumvent real provenance issues by “just” naming graphs, but this is only intended for exchange between trusted parties, it’s not a format that’s expected to be found and consumed as other RDF documents found on the Web.

Continue reading Exchange of Named RDF Graphs

Aggregating and Archiving RSS Items

One of the better arguments for RSS 1.0 over other syndication formats is the claim that the (meta) data plugs directly into the greater Semantic Web, thus making it possible to go both back and forth between the two, making them one. Unfortunately, most aggregators don’t really aggregate, at most they just present a cached version of what’s currently offered, resulting in a disconnect, as Bob DuCharme recently pointed out on rdf-interest (eventually leading to rdfdata.org).

However, archiving “items” from RSS feeds over time presents a few issues.

Not all RSS items have their own globally unique identifier
Some RSS feeds are “linkrolls” more than a list of recently created or updated resources. A linkroll references other resources directly, sometimes making incorrect statements about e.g. the creator or time of publication (example: del.icio.us/mortenf). Reliable identification is needed to be able to recognise items that are new, old or updated.
The ambigious definition of a channel
In the RSS 1.0 spec it says the following about the rdf:about attribute on the channel element: Most commonly, this is either the URL of the homepage being described or a URL where the RSS file can be found. The right choice seems to always be the channel URI, the source of the statements, as that is what is commonly referred to by rdfs:seeAlso in e.g. blogrolls and personal FOAF files, and most often as the identifier used for provenance in a triple store.
The rss:items/rdf:Seq construct
Each item is associated with one or more channels through the rss:items property, referencing a sequence of the “current” items. The sequence of items is determined through the use of the RDF/XML syntactic construct rdf:li, which is expanded to rdf:_1, rdf:_2, and so on, in the RDF model. When a new item is added to a channel, it is added at the first position, rdf:_1, the existing items shift towards the end of the sequence, and the last item disappears from the sequence. In a naïve implementation, archiving a channel over time would lead to a “sequence” with each item being referenced more than once, and loss of actual temporal information — it’d be impossible to determine the actual order in which the items appeared. Note also, that in an even more naïve implementation (one that doesn’t recognise that the two sequences should be seen as one), the result wouldn’t be an “invalid” sequence, but instead a channel with multiple rss:items properties, each with a perfectly fine sequence.

Continue reading Aggregating and Archiving RSS Items