Reboot

It’s now been more than a week since reboot 7.0, and I’m once again able to think somewhat straight.

‘Twas a great couple of days, with a lot of inspirational talks, chatter, and laughs, and even a beer thrown in here and there. Marc Canter wasn’t there this time, so nobody sang — at least as far as I know.

image-12

The Semantic Web didn’t get that much mention, but at least David Weinberger (depicted above) pointed out that knowledge isn’t tree-structured — hopefully that’ll take root (hah) with some XML folks!

It’s a good thing I’m too busy these days with projects at work and house-hunting the rest of the time, otherwise I wouldn’t know where to begin sorting out what to do next…

Reboot and paradigms

The first day of reboot 7.0 had a surprise up its sleeve for those of us that stayed throughout the day: Not only did we get to see the planned showing of Doug Engelbart‘s 1968 demonstration of the NLS system featuring among other novelties a mouse, hypertext, and a keyword-ranking search engine — he joined us via a/v link, and stayed for a conversation after the show:

image-3

A very enlightening experience, even if those of us born after the demonstration took place might not fully appreciate its significance. Engelbart later emphasized paradigms, and how the process towards fullfilment of the goals of his research will be evolutionary rather than revolutionary, a view I’m sure Tim Berners-Lee agrees on with regard to the SemWeb aspect (even if some might say that linking to individual objects sounds more like XLink).

Earlier in the day we heard about blogging, Wikipedia, 37signals’ development process, and Ruby on Rails (must… make… time… for… check…) — Jesper Balslev has notes and pointers.

All in all a day above average that started out slow, but picked up at the end — I’m looking forward to day 2, featuring always enjoyable Ben Hammersley.

Describing Source Content for Redland/MySQL

I mentioned SADDLE (which used to be a part of the SPARQL Protocol draft, but is no longer) in passing the other day, when describing OWL-S Maker and talking about service description in general.

Service description in this context — and in the context of Dion Hinchcliffe’s OWL-S-less overview of SDLs — is mostly about the interface, the inputs and outputs, not what’s in between.

In contrast, SADDLE originally entered that territory with its properties like saddle:vocabulary, and the other day on dev@gargonza Damian Steer announced a nice little javascript hack for using source content descriptions — this is not about I/O, but about what a “service” contains information about.

Central to Damian’s hack is a source content description, containing OWL statements about which classes and properties are present in the SPARQL source. For example, his description shows that all objects of foaf:name statements (in this particular store) are literals.

While the above example was handmade, I realized this was getting close to what I’ve been meaning to do for generating simpler and cleaner UIs for triplestores (asking for a foaf:Person? It’s likely you’d also want a foaf:name then…), so I figured I should try to generate such an SCD — Source Content Description — automagically, as Damian hints to himself: Ideally this information would mined from the store.

I’ve managed to come up with a single query that returns all the information necessary to construct an SCD, but since it’s quite complex, I’ll explain the steps I took on the way there.

Continue reading Describing Source Content for Redland/MySQL

Redland Hacking

During the last few days, I’ve been hacking a bit on — and with — Redland.

First off, I verified that a bug and associated patch from Simon Cross regarding portability of the hash calculations in the MySQL storage engine was indeed working. When originally writing the code for this I hadn’t thought of the use case of accessing a storage on a different architecture, but that is of course an important one. The issue is now closed, Dave Beckett has applied the patch to CVS.

I also created an issue regarding the design decision to not look for hash collisions, 28: Hash collisions possible in MySQL storage engine. I don’t have a solution ready for this, but I thought it would be a good idea to get it out in the open, so people are aware of the problem.

Another minor issue with the MySQL storage was its excessive use of connections, especially visible when using Rasqal. I wrote a patch to make it use persistent connections, and Christopher Schmidt was kind enough to help me test it. It seems to be working fine — it does here as well, so I sent a message to redland-dev asking for comments, hopefully this will get into CSV soon as well.

Then came a bit of work on the long-running issue with the PHP interface to Redland. PHP has its own unique NULL-value, so when the Redland Bindings blindly returned a C NULL wrapped in a regular PHP object (in the case of an error), Redland would crash Apache/PHP upon trying to use that object. In the past, Dave has been kind enough to hack a bit here and there when I ran into problems, but I decided to try to close the issue more pemanently. Thus, as explained in 15: PHP binding functions should return a PHP null, I patched the pointer return function to always return a PHP NULL instead a C one. My first version of the patch seems to have been faulty, as Dave couldn’t apply it to CVS, but I created a new one that I hope will do a better job. Also, as a side effect of this change, it is now no longer possible to pass a C NULL into some of the Redland functions where needed, so it seems we have to create a few PHP helper functions to return a C NULL wrapped in a PHP object…

I’ve got more ideas for improvement to Redland, but they really can’t be considered as anything other than feature requests to be coded on a day (and night) with nothing else to do, so I haven’t created issues for these:

  • An option for the MySQL storage to prefix table names with a constant string, to make it possible to have more than one storage in the same database, inspired by the way WordPress does it, and to help out with Dan Brickley’s SparqlPress project.
  • Some builtin “reasoning” functions, to — among other things — make my Redland Smusher obsolete. I’ve discussed this a bit with Dave, but we still haven’t figured out the “right” or best way to implement it.
  • It seems the new version of 3store will store simple datatyped literals like integers in separate columns, to make it easier for the database enginge to work with the values and to better support SPARQL. I think I’d like to do the same for the Redland MySQL storage, but still have to figure out the implications.
  • A new MySQL storage enginge that reads — later on maybe writes as well — the Jena schema layout. This could perhaps be an option to the MySQL storage enginge, in which case it would be almost trivial to also add an option for storing in a simpler, denormalized layout, where all the information is in a single table instead of spread out over four.

Last, and in some sense also least, I hacked a little conversion service, CSV-SPARQLer, that simply takes a URI to a CSV file and turns it into SPARQL Variable Bindings Results format (example, show query, extra example, show extra query).
As the extra example shows, I wanted to be able to subscribe to the action that goes on in the Redland Issue Tracker, but all it made available was a CSV file, so there: A CSV file converted into SPARQL result format, then converted into RSS through SPARQL Conversions XSLT. The resulting RSS is not perfect, notably the titles are a bit generic, but it’s good enough.