Category Archives: Redland

http://librdf.org/ Redland RDF Application Framework
skos:related http://crschmidt.net/blog/archives/author/crschmidt/skos.rdf#c4 Redland RDF Application Framework

Redland/MySQL utilities

Since it seems a number of people are using the MySQL backend storage for Redland, most notably crschmidt with his julie in #julie on freenode, now seems like a good time to “release” a few shell scripts I have put together.

Common to all of them is the use of the Unix principle of quiet operation iff successful, and non-zero return codes when not.

redland-mysql-optimize (latest version: 0.2)
As with all databases, it’s a good idea to make sure the indices are up-to-date. The redland-mysql-optimize script will make sure that all relevant tables in a triple store have indices that reflect the contents of each table in the best way possible. The script depends on the mysql command line client.
Usage example:
redland-mysql-optimize db

Changes since 0.1:

  • Changed a $1 to $*, thanks Russell!
redland-mysql-clean (latest version: 0.2)
When a statement is deleted from a MySQL triple store through the Redland API, the associated node tables (Bnodes, Resources and Literals) are not updated — more on why that is in a separate post. Also, a statement can currently be asserted more than once in a single context.
Together, these issues can lead to the various tables containing more data than needed. The redland-mysql-clean script will, when run against a specific database, remove all orphaned nodes and duplicate statements in all models. It’s a good idea to run the redland-mysql-optimize script after a cleanup. The script depends on perl, the mysql command line client, and the join (1) utility.
Usage example:
redland-mysql-clean db

Changes since 0.1:

  • Fixed interpolation of $model, thanks to Russell – again!
redland-mysql-drop-model (latest version: 0.1)
The Redland API doesn’t include a method for removing a model once it has been created in a storage. This script makes up for that, helping out when a model is no longer needed. Be sure to run the redland-mysql-clean and redland-mysql-optimize scripts afterwards. The script depends on perl and the mysql command line client.
Usage example:
redland-mysql-drop-model model db

NOTE: This is a destructive script.

Concise Bounded Resource Descriptions in Redland/MySQL

While I’m not sure about the merits of the entire URIQA proposal by Patrick Stickler, it does introduce the very nice concept of CBD‘s.

The concept is similar to — actually a superset of — FOAF’s notion of minimally identifying set of properties, the set of properties for a person that is needed to identify, display and get more information about the person, usually including a name (or nickname), at least one inverse functional property and a link, rdfs:seeAlso.

For this reason, and a few others, I decided to implement this in Redland and the Redland/MySQL storage engine as a method for the Model “class”, librdf_model_cbd_as_stream. Since I wanted to leave it up to each storage implementation how to implement it, it turned out to require quite a few source file changes, but I will be handing them over to Dave Beckett for inclusion in the next version of Redland if he sees it fit.

The definition of CBD is recursive, as for each bnode object the statements where it appears as a subject must be included in the result and so on, but implementing infinite recursive queries in SQL is impossible. To overcome this issue, I decided to go with the following algorithm (node is the input resource for which a CBD is wanted):

list of nodes = (node)
count of nodes = 1
REPEAT
  last count of nodes = count of nodes
  list of nodes = SQL(bnodes objects of statements with subject in list of nodes) + node
  count of nodes = COUNT(list of nodes)
UNTIL count of nodes = last count of nodes
RETURN statements with subject in list of nodes

The SQL generated for the query for bnode objects looks like this (operating on the most recent Redland/MySQL storage engine database schema):

select distinct ID
from Statements join Bnodes on Object=ID
where Subject=7972813756443468730 or Subject=10313337636846108089

While the algorithm works, and doesn’t put too much strain on the connection between the client and server, it does require at least one extraneous query, since the loop ends when two subsequent queries yield the same result. Hints on improving this will be much appreciated.

Please note that I have left out step 3 of the CBD definition, the reification part. This is mostly due to the reason that I don’t work with reification in my models, but also because I don’t see reification in the RDF sense to be of much use in practical implementations.

Also, in contrast to the CBD definition, this algorithm and implementation allows for CBDs for bnodes, not just URIs.

Easy RDF-parsing with PHP

A common complaint about the RDF/XML syntax in the XML-literate communities is the lack of a simple PHP parser. While Redland with Raptor does the job perfectly, it almost demands root access to install, and doesn’t run on the Windows platform without cygwin.

The best alternative for PHP is RAP, but that is often claimed to be too slow or there are problems understanding and using the API.

In trying to help out, I won’t be writing an RDF/XML parser from scratch (perhaps someone else will port Sean B. Palmer’s rdfxml.py to PHP), but I have created a little wrapper class for RAP, SimpleRdfParser, that only gives access to the RDF/XML parser, and thus doesn’t need the entire library. Also, the exposed API is simply an array of triples (indexed by subject), and together these simplifications help out on the parsing speed. There’s still room for improvement though, RAP was started a while ago and is based on previous syntax specifications, so it contains support for a number of constructs that aren’t legal anymore.

In addition to the parse method, string2triples, the class also contains a serialiser, triples2string, which turns the graph into a simple subset of RDF/XML, suitable for handling with a regular XML parser or XSLT, should anyone have those desires…

Examples:

The careful reader will notice that there is something missing in the output: The literal “Morten Frederiksen” should have a language of “en”, but it doesn’t. This is a bug in RAP, which has been reported and will likely be fixed in the next version.

Update: A small benchmark for parsing and reserializing appr. 800 statements (source) 100 times with Redland/Raptor, SimpleRdfParser, and RAP:

It turns out Redland/Raptor is about 3 times as fast as SimpleRdfParser, which is about twice as fast as RAP.

Update 2: A more realistic benchmark, doing only the parsing, no serialising:

Transforming RDF/XML with XSLT

A couple of months ago, while working on a project that will hopefully see the light of day soon, I realised I needed terms for singular and plural labels for properties and classes. Even with the help of SchemaWeb I couldn’t find existing terms, so I decided to cook my own, resulting in the label vocabulary with two properties:

plural
A relation between a term and its label in literal plural form.
singular
A relation between a term and its label in literal singular form.

This was not the only vocabulary I was working on at that moment, and I needed to be able to get an overview, a human-readable version. Last year I did the RDFS Explorer for basically the same purpose, but since I was entering OWL territory, it wasn’t really up to the task. Back to square one.

Continue reading Transforming RDF/XML with XSLT