UPDATE: This implementation has been updated, please see Named Graph Exchange.
Every now and then I’ve run into the need for transporting an RDF graph between triple stores. I use Redland/MySQL with contexts to store information about the origin of each triple, so up until now the only way has been to transfer the triples directly from one database to another. This is because triples are just that, triples, not quads, and RDF itself only provides reification as a way out, not a very attractive option for space and performance reasons.
There have been other approaches to dealing with graph naming in RDF, TriG is one, N3/cwm has another — here’s yet another way: Wrapping up the graphs not in a single document, but in a zip archive with an index mapping documents to names.
It may seem unwise to seemingly try to circumvent real provenance issues by “just” naming graphs, but this is only intended for exchange between trusted parties, it’s not a format that’s expected to be found and consumed as other RDF documents found on the Web.
The zipped-document-collection approach is not exactly new on the XML scene, and is inspired by the way Open Office uses XML Packages to store documents and related files, which in turn seems inspired by Java’s JAR File Format. Leigh Dodds has written more on the subject in his XML-Deviant column Wrap Your App.
The advantage of using an archive with multiple files is that there’s no need for a new syntax — or for changes to existing ones, in fact, this approach works equally well for all RDF serialisation syntaxes, even though RDF/XML is expected to be used the most because of its advantage in encoding handling.
In short, the zip archive format defined here — with example Perl/Redland implementations — consists of a collection of documents, each in one particular RDF serialisation syntax, and a manifest with graph names, using the predicates rdfs:seeAlso
and rdfs:label
, in the same serialisation syntax. The object of the rdfs:label
predicate can be of any type (URIref, Literal or bnode), with URIref expected to be the most common (and assumed in the example serialisation implementation below).
Please note that there is no restriction on the serialisation syntax of neither manifest nor the individual files, only that they must all use the same syntax, indicated by the extension of the rdf-manifest
file located in the META-INF
directory. Currently defined in the example implementation are the following:
- .ttl (Turtle)
- .nt (NTriples)
- .rdf (RDF/XML)
Also note that this format does not (yet?) have a name, as it can actually coexist with both of Open Office’s format and JAR (in their current incarnations, a future naming clash is possible). (The file named rdf-manifest
[case sensitive] could have been named just manifest
, but then there’d be a naming clash for sure when all the other archive formats switch to RDF for metadata!)
The suggested extension to use for this type of archive is (currently) zip
.
example.zip
Archive: example.zip 22661 bytes 4 files -rw------- 2.0 unx 86331 tl defN 4-Jan-05 15:30 graph1.nt -rw------- 2.0 unx 37784 tl defN 4-Jan-05 15:30 graph2.nt -rw------- 2.0 unx 53421 tl defN 4-Jan-05 15:30 graph3.nt -rw------- 2.0 unx 618 tl defN 4-Jan-05 15:30 META-INF/rdf-manifest.nt 4 files, 178154 bytes uncompressed, 22169 bytes compressed: 87.6%
example.zip(META-INF/rdf-manifest.nt)
<graph2.nt> <http://www.w3.org/2000/01/rdf-schema#label> <http://www.wasab.dk/morten/blog/archives/author/mortenf/foaf.rdf> . <META-INF/rdf-manifest.nt> <http://www.w3.org/2000/01/rdf-schema#seeAlso> <graph3.nt> . <META-INF/rdf-manifest.nt> <http://www.w3.org/2000/01/rdf-schema#seeAlso> <graph2.nt> . <META-INF/rdf-manifest.nt> <http://www.w3.org/2000/01/rdf-schema#seeAlso> <graph1.nt> . <graph3.nt> <http://www.w3.org/2000/01/rdf-schema#label> <http://www.wasab.dk/morten/blog/archives/author/mortenf/skos.rdf> . <graph1.nt> <http://www.w3.org/2000/01/rdf-schema#label> <http://www.wasab.dk/morten/blog/feed/rdf> .
(Note the use of relative URIrefs, which may not be smart in the long run, but does make sense from a document perspective.)
Implementations
The following Perl/Redland scripts require RDF::Redland
and Archive::Zip
(as well as the commonly available Getopt::Long
, Pod::Usage
, and URI
), and have been tested with Redland 0.9.18 and the MySQL and BDB storage backends.
UPDATE: This implementation has been updated, please see Named Graph Exchange.
- redzip-serialise.pl (version 0.3):
- A script for serialising one or all contexts in a persistent Redland model to an archive on STDOUT.
- redzip-parse.pl (version 0.1):
- A script for parsing (loading) a named archive into a persistent Redland model with contexts.
Comments much appreciated.
See also TriX, a very simple RDF-as-XML serialization with graph naming capability.
The zip approach looks very good to me. Seems like something we could use around NG4J. One nitpick: rdfs:label is intended for human-readable labels, not quite appropriate here. I believe I’ve seen a graph naming property somewhere around the Carroll et al. named graphs stuff, but couldn’t find it now.
Richard,
Right, I should have mentioned TriX as well.
You are of course right about rdfs:label too, but it was right there, and I didn’t want to invent a new predicate just for this…