Redland Hacking

During the last few days, I’ve been hacking a bit on — and with — Redland.

First off, I verified that a bug and associated patch from Simon Cross regarding portability of the hash calculations in the MySQL storage engine was indeed working. When originally writing the code for this I hadn’t thought of the use case of accessing a storage on a different architecture, but that is of course an important one. The issue is now closed, Dave Beckett has applied the patch to CVS.

I also created an issue regarding the design decision to not look for hash collisions, 28: Hash collisions possible in MySQL storage engine. I don’t have a solution ready for this, but I thought it would be a good idea to get it out in the open, so people are aware of the problem.

Another minor issue with the MySQL storage was its excessive use of connections, especially visible when using Rasqal. I wrote a patch to make it use persistent connections, and Christopher Schmidt was kind enough to help me test it. It seems to be working fine — it does here as well, so I sent a message to redland-dev asking for comments, hopefully this will get into CSV soon as well.

Then came a bit of work on the long-running issue with the PHP interface to Redland. PHP has its own unique NULL-value, so when the Redland Bindings blindly returned a C NULL wrapped in a regular PHP object (in the case of an error), Redland would crash Apache/PHP upon trying to use that object. In the past, Dave has been kind enough to hack a bit here and there when I ran into problems, but I decided to try to close the issue more pemanently. Thus, as explained in 15: PHP binding functions should return a PHP null, I patched the pointer return function to always return a PHP NULL instead a C one. My first version of the patch seems to have been faulty, as Dave couldn’t apply it to CVS, but I created a new one that I hope will do a better job. Also, as a side effect of this change, it is now no longer possible to pass a C NULL into some of the Redland functions where needed, so it seems we have to create a few PHP helper functions to return a C NULL wrapped in a PHP object…

I’ve got more ideas for improvement to Redland, but they really can’t be considered as anything other than feature requests to be coded on a day (and night) with nothing else to do, so I haven’t created issues for these:

  • An option for the MySQL storage to prefix table names with a constant string, to make it possible to have more than one storage in the same database, inspired by the way WordPress does it, and to help out with Dan Brickley’s SparqlPress project.
  • Some builtin “reasoning” functions, to — among other things — make my Redland Smusher obsolete. I’ve discussed this a bit with Dave, but we still haven’t figured out the “right” or best way to implement it.
  • It seems the new version of 3store will store simple datatyped literals like integers in separate columns, to make it easier for the database enginge to work with the values and to better support SPARQL. I think I’d like to do the same for the Redland MySQL storage, but still have to figure out the implications.
  • A new MySQL storage enginge that reads — later on maybe writes as well — the Jena schema layout. This could perhaps be an option to the MySQL storage enginge, in which case it would be almost trivial to also add an option for storing in a simpler, denormalized layout, where all the information is in a single table instead of spread out over four.

Last, and in some sense also least, I hacked a little conversion service, CSV-SPARQLer, that simply takes a URI to a CSV file and turns it into SPARQL Variable Bindings Results format (example, show query, extra example, show extra query).
As the extra example shows, I wanted to be able to subscribe to the action that goes on in the Redland Issue Tracker, but all it made available was a CSV file, so there: A CSV file converted into SPARQL result format, then converted into RSS through SPARQL Conversions XSLT. The resulting RSS is not perfect, notably the titles are a bit generic, but it’s good enough.