Only a few weeks after SPARQLing Days I’m now finally ready to report some progress…
2005-05-07: Updated to version 0.3, see changelog below.
As is now common knowledge, one outcome of the great gathering in Tuscany is The Gargonza Experiment, which attempts
to create a set of showcases for the Resource Description Framework (RDF) and the SPARQL Query Language for RDF. Already the “community theatre” has gained a bit of steam and an increasing number of followers, and I’m sure there’ll be more to come as people get the stuff they started working on completed. To keep up-to-date, I suggest y’all subscribe to Planet RDF or at least to the Gargonza Experiment category that Danny Ayers — my gracious host for a few days following the main event — is maintaining.
To assist in that goal, I’ve made a subset of my photo description database available for SPARQL queries via Sparqlette — A SPARQL demo query service. The subset currently contains only descriptions of the photos taken during SPARQLing Days, and not all photos have been through my annotation process, there are still some foaf:depicts triples to be made…
Continue reading The Gargonza Experiment
Some time — actually more than a year — ago i wrote a smusher for Redland that works by rewriting nodes based on identity inference.
To begin with, it handled only IFP’s, owl:InverseFunctionalProperty, but the other day I needed it to be able to handle FP’s, owl:FunctionalProperty, as well.
A classic example of an IFP is foaf:homepage — only one resource can have some specific URI as its homepage, which is handy for identity reasoning across the Web. Just as useful is the somewhat recently added property foaf:primaryTopic, which is an FP — if a page is described in more than one place, each with a seemingly different primary topic, it can be inferred that the two “topics” are actually just one, handy when identifying movies, since almost all movies have a page describing it at the Internet Movie Data Base.
The smusher is written in C, isn’t heavily commented, has been used elsewhere without problems, and works by finding IFP’s and FP’s in the model it is smushing or by being passed a specific property to smush on — a nasty way of testing is to pass it rdf:type…
Building it should be somewhat straight forward, but the accompanying Makefile might help out here and there.
Note: This entry — as all entries in the Release category — will serve as a changelog (you can subscribe to its RSS feed if you want to make sure you don’t miss out on any updates).
The current version is 0.21 (released 2005-03-27).
- Changes since 0.20:
- Reworked rewriting process to avoid database deadlocks.
UPDATE: This implementation has been updated, please see Named Graph Exchange.
Every now and then I’ve run into the need for transporting an RDF graph between triple stores. I use Redland/MySQL with contexts to store information about the origin of each triple, so up until now the only way has been to transfer the triples directly from one database to another. This is because triples are just that, triples, not quads, and RDF itself only provides reification as a way out, not a very attractive option for space and performance reasons.
There have been other approaches to dealing with graph naming in RDF, TriG is one, N3/cwm has another — here’s yet another way: Wrapping up the graphs not in a single document, but in a zip archive with an index mapping documents to names.
It may seem unwise to seemingly try to circumvent real provenance issues by “just” naming graphs, but this is only intended for exchange between trusted parties, it’s not a format that’s expected to be found and consumed as other RDF documents found on the Web.
Continue reading Exchange of Named RDF Graphs
Phil Dawes is working on an ifpstore to back his Veudas RDF editor. Yesterday he published some benchmarks for importing triples into ifpstore.
While there are quite a number of variables — the data structures differ a bit, the hardware used isn’t identical, etc… — I thought I’d try to redo the benchmarks with the Redland/MySQL storage backend.
At first I tried importing the four separate files from Wordnet, then serialised in NTriples syntax in one file, and last a single RDF/XML file, generated naïvely (meaning that the size of the file is larger than the four raw files combined, even if it contains the same triples) from the NTriples one.
Then I repeated the tests, this time with the bulk loading features — table locking and temporary disabling of indices — of the MySQL storage backend turned on. The benchmark below includes the index rebuilding phase that takes place after the actual load.
|Redland/MySQL Wordnet Benchmark
As the numbers suggest, the bulk loading features work, and raptor is faster at parsing NTriples than RDF/XML, hardly a surprise.
What doesn’t show in the numbers is the fact that the entire load process is CPU bound, it’s not disk accesses that’s taking time. Also, the amount of triples, 473589, isn’t enough to fill up the in-memory cache MySQL maintains (here), not even importing a large dataset like Jim’s 6.7 million scuttered statements (converted into NTriples with jim2ntriples) seems to be. With bulk loading turned on, that entire process takes about 34 minutes, equivalent to about 3300 triples per second, as compared to the about 3000 triples per second for the best case above.
Since last year I have been storing metadata about my photos, and other stuff like the twilight data, in a Redland triple store, backed by the MySQL storage implementation I wrote. I have been writing various PHP scripts for maintaining and querying the various graphs, and I found that there were a few basic tasks I kept implementing, writing the same code over and over again. At one point I finally started factoring them out into a common “library”, librdfutil (syntax highlighted version).
Most of the functions are only helpful in a PHP and/or MySQL environment, but a few of them might make it into the core Redland API at some point if other people find them useful and we can persuade Dave Beckett to include them.
The current version of this library is 0.0.1 (this entry will server as the changelog).
librdfutil_mysql_cbd_original_lite: Add the original CBD (without reification) for a node object to a model.
librdfutil_mysql_cbd_lite: Add revised CBD (without reification) for a node object to a model.
librdfutil_strings_to_node: Create a new librdf_node from a set of strings.
librdfutil_stringset_to_statement: Create a new librdf_statement from a set of strings.
librdfutil_tuple_to_statement: Create a new librdf_statement from a database tuple.
librdfutil_model_to_string: Get a serialised representation of a model, in R3X or Turtle syntax.
librdfutil_stream_to_string: Get a serialised representation of a stream of statements, in R3X or Turtle syntax.
librdfutil_node_to_turtle_string: Generate Turtle syntax fragment for node.
librdfutil_node_to_hash: Get a string hash (the MySQL ID) of a node object.
librdfutil_strings_hash: Get a string hash (the MySQL ID) of a node.
A few additional notes:
- The MySQL specific functions, the ones that start with
librdfutil_mysql, require ADODB.
- In case anyone is wondering about the naming of the CBD functions, I expect to implement the full specification(s) at a later date — or perhaps someone else will contribute them…
- At some point, a pretty API overview with usage examples will be created, for easy reference.
Comments, bug reports, and suggestions are as always much appreciated. Thanks to Russell Cloran for initial thoughts and comments on the pre-release version.