Category Archives: SemWeb

http://www.w3.org/2001/sw/
skos:broader http://danbri.org/words/?author=2/skos.rdf#c9 Web Technology

Concise Bounded Resource Descriptions in Redland/MySQL

While I’m not sure about the merits of the entire URIQA proposal by Patrick Stickler, it does introduce the very nice concept of CBD‘s.

The concept is similar to — actually a superset of — FOAF’s notion of minimally identifying set of properties, the set of properties for a person that is needed to identify, display and get more information about the person, usually including a name (or nickname), at least one inverse functional property and a link, rdfs:seeAlso.

For this reason, and a few others, I decided to implement this in Redland and the Redland/MySQL storage engine as a method for the Model “class”, librdf_model_cbd_as_stream. Since I wanted to leave it up to each storage implementation how to implement it, it turned out to require quite a few source file changes, but I will be handing them over to Dave Beckett for inclusion in the next version of Redland if he sees it fit.

The definition of CBD is recursive, as for each bnode object the statements where it appears as a subject must be included in the result and so on, but implementing infinite recursive queries in SQL is impossible. To overcome this issue, I decided to go with the following algorithm (node is the input resource for which a CBD is wanted):

list of nodes = (node)
count of nodes = 1
REPEAT
  last count of nodes = count of nodes
  list of nodes = SQL(bnodes objects of statements with subject in list of nodes) + node
  count of nodes = COUNT(list of nodes)
UNTIL count of nodes = last count of nodes
RETURN statements with subject in list of nodes

The SQL generated for the query for bnode objects looks like this (operating on the most recent Redland/MySQL storage engine database schema):

select distinct ID
from Statements join Bnodes on Object=ID
where Subject=7972813756443468730 or Subject=10313337636846108089

While the algorithm works, and doesn’t put too much strain on the connection between the client and server, it does require at least one extraneous query, since the loop ends when two subsequent queries yield the same result. Hints on improving this will be much appreciated.

Please note that I have left out step 3 of the CBD definition, the reification part. This is mostly due to the reason that I don’t work with reification in my models, but also because I don’t see reification in the RDF sense to be of much use in practical implementations.

Also, in contrast to the CBD definition, this algorithm and implementation allows for CBDs for bnodes, not just URIs.

Semantic Comments Feeds from WordPress

As hinted to last week, I now have my WordPress installation output comments as RSS 1.0, both as a blog-wide feed and as a feed per post.

At the core they are just like other feeds, but comments are a little more diverse than posts. There are regular comments, entered in the comment form for each post, and PingBack‘s and TrackBack‘s, both of which are “remote” — a notification of reference from somewhere else.

Each of these three types of comments are different from each other, and need to be handled differently.

Continue reading Semantic Comments Feeds from WordPress

Label Vocabulary in Spanish

The Label vocabulary now also contains labels in Spanish.

{Leandro Mariano López}, the master behind inkelog and the Speaks, Reads and Writes Schema, stepped up to plate last night and sent me a translation of the terms and comments – thanks!

That of course meant that I had to make it possible to navigate between the different language versions, and tweak the content negotiation a bit.

The owl2html XSLT is now up to version 0.2, and a new small tool has seen the light of day: rdf-path.

It’s a simple Perl script with an XPath interface to an RDF/XML document, with a bunch of prefixes and namespaces predeclared. It’s simple to the point of triviality, but does its job well, in this case extracting a list of available languages for an ontology, to automate the generation of HTML pages:

#!/bin/bash
base=http://purl.org/net/vocab`/bin/pwd|sed -e 's/^.*web//'`/
for lang in `rdf-path "/*/*[rdf:type[@rdf:resource='http://www.w3.org/2002/07/owl#Ontology']]/rdfs:comment/@xml:lang" $1.rdf`; do
  owl2html $1.rdf uri $base$1# lang $lang css "http://www.wasab.dk/morten/2004/06/owl2html.css" > $1.$lang.html 
done

Additional translations are of course more than welcome.

Multi-lingual Literals in RDF

As a non-native english speaker, it’s good to see that both XML and RDF support language “tagging” of literals, to avoid the blind assumption that everything will be in English. Apparantly the concept doesn’t get much use though, I have yet to see any tools that support multiple languages at the application level (with the possible exception of foaf-a-matic, which I translated into Danish, but it doesn’t do it at the vocabulary level).

Since I sometimes do development in Danish, but also want to integrate with the rest of the world, I have begun creating vocabularies with labels in at least Danish and English. I have also set up partial HTML presentation in multiple languages – the syntax parts are completed, and the content negotiation setup should be working: When dereferencing e.g. the namespace URI for the label vocabulary, http://purl.org/net/vocab/2004/03/label, you should get the English HTML version, unless you have your browser set up to accept Danish [da] as I have, in which case you should get the Danish version. If your user agent sends an Accept: header containing application/rdf+xml, you should get the RDF/XML version – even if this method of operation isn’t completely defined, the W3C TAG is working hard on the issue, httpRange-14.

Now, how to decide which literal to use when multiple are present?

Continue reading Multi-lingual Literals in RDF

Easy RDF-parsing with PHP

A common complaint about the RDF/XML syntax in the XML-literate communities is the lack of a simple PHP parser. While Redland with Raptor does the job perfectly, it almost demands root access to install, and doesn’t run on the Windows platform without cygwin.

The best alternative for PHP is RAP, but that is often claimed to be too slow or there are problems understanding and using the API.

In trying to help out, I won’t be writing an RDF/XML parser from scratch (perhaps someone else will port Sean B. Palmer’s rdfxml.py to PHP), but I have created a little wrapper class for RAP, SimpleRdfParser, that only gives access to the RDF/XML parser, and thus doesn’t need the entire library. Also, the exposed API is simply an array of triples (indexed by subject), and together these simplifications help out on the parsing speed. There’s still room for improvement though, RAP was started a while ago and is based on previous syntax specifications, so it contains support for a number of constructs that aren’t legal anymore.

In addition to the parse method, string2triples, the class also contains a serialiser, triples2string, which turns the graph into a simple subset of RDF/XML, suitable for handling with a regular XML parser or XSLT, should anyone have those desires…

Examples:

The careful reader will notice that there is something missing in the output: The literal “Morten Frederiksen” should have a language of “en”, but it doesn’t. This is a bug in RAP, which has been reported and will likely be fixed in the next version.

Update: A small benchmark for parsing and reserializing appr. 800 statements (source) 100 times with Redland/Raptor, SimpleRdfParser, and RAP:

It turns out Redland/Raptor is about 3 times as fast as SimpleRdfParser, which is about twice as fast as RAP.

Update 2: A more realistic benchmark, doing only the parsing, no serialising: