Concise Bounded Resource Descriptions in Redland/MySQL

While I’m not sure about the merits of the entire URIQA proposal by Patrick Stickler, it does introduce the very nice concept of CBD‘s.

The concept is similar to — actually a superset of — FOAF’s notion of minimally identifying set of properties, the set of properties for a person that is needed to identify, display and get more information about the person, usually including a name (or nickname), at least one inverse functional property and a link, rdfs:seeAlso.

For this reason, and a few others, I decided to implement this in Redland and the Redland/MySQL storage engine as a method for the Model “class”, librdf_model_cbd_as_stream. Since I wanted to leave it up to each storage implementation how to implement it, it turned out to require quite a few source file changes, but I will be handing them over to Dave Beckett for inclusion in the next version of Redland if he sees it fit.

The definition of CBD is recursive, as for each bnode object the statements where it appears as a subject must be included in the result and so on, but implementing infinite recursive queries in SQL is impossible. To overcome this issue, I decided to go with the following algorithm (node is the input resource for which a CBD is wanted):

list of nodes = (node)
count of nodes = 1
REPEAT
  last count of nodes = count of nodes
  list of nodes = SQL(bnodes objects of statements with subject in list of nodes) + node
  count of nodes = COUNT(list of nodes)
UNTIL count of nodes = last count of nodes
RETURN statements with subject in list of nodes

The SQL generated for the query for bnode objects looks like this (operating on the most recent Redland/MySQL storage engine database schema):

select distinct ID
from Statements join Bnodes on Object=ID
where Subject=7972813756443468730 or Subject=10313337636846108089

While the algorithm works, and doesn’t put too much strain on the connection between the client and server, it does require at least one extraneous query, since the loop ends when two subsequent queries yield the same result. Hints on improving this will be much appreciated.

Please note that I have left out step 3 of the CBD definition, the reification part. This is mostly due to the reason that I don’t work with reification in my models, but also because I don’t see reification in the RDF sense to be of much use in practical implementations.

Also, in contrast to the CBD definition, this algorithm and implementation allows for CBDs for bnodes, not just URIs.