Category Archives: Databases

Computing Cloud

Amazon is launching another Limited Beta, which seems to be the only way to launch anything these days.

This time they build on the S3 success, and add the next obvious component to the equation: Computing on demand, Amazon Elastic Compute Cloud (Amazon EC2).

As with S3, you get what you pay for and vice versa, no signup fees. S3 is used (and paid for separately) to store the computing images, and the EC2 additional pricing seems at first glance to be quite reasonable:

  • $0.10 per instance-hour consumed (or part of an hour consumed).
  • $0.20 per GB of data transferred outside of Amazon (i.e., Internet traffic).

The bandwidth cost might be an issue, but depending on what you want to compute, it might not a problem.

I could envision running some serious reasoning on a thing like this…

Semantic Mainstream

I spent most of my wednesday at the Copenhagen Bella Center at the anual Linux Conference : Linux on Enterprise 2006. It was organized in part by OSL — thanks to Magenta for the invitation.

It was quite interesting to hear about practical experiences with open standards and open source in the government, and listening to Peter Quinn of Massachusetts fame was inspiring (as was the free beer — the presentation included hilarious jabs at Richard Stallman).

Most interesting was the fact that the IDC analyst as well as the special guest Dirk van Rooy from the EU Information Society/IST activity, both mentioned intelligence in computing as “the future”, with clear references to the Semantic Web (and the omnipresent intelligent refrigerator).

Another point: The May/June issue of Oracle Magazine contains an article on the Semantic Web, with sidebars tooting the horn of Oracle 10g Release 2: Semantic Breakthrough

I think it seems like this stuff is going to stick…

Describing Source Content for Redland/MySQL

I mentioned SADDLE (which used to be a part of the SPARQL Protocol draft, but is no longer) in passing the other day, when describing OWL-S Maker and talking about service description in general.

Service description in this context — and in the context of Dion Hinchcliffe’s OWL-S-less overview of SDLs — is mostly about the interface, the inputs and outputs, not what’s in between.

In contrast, SADDLE originally entered that territory with its properties like saddle:vocabulary, and the other day on dev@gargonza Damian Steer announced a nice little javascript hack for using source content descriptions — this is not about I/O, but about what a “service” contains information about.

Central to Damian’s hack is a source content description, containing OWL statements about which classes and properties are present in the SPARQL source. For example, his description shows that all objects of foaf:name statements (in this particular store) are literals.

While the above example was handmade, I realized this was getting close to what I’ve been meaning to do for generating simpler and cleaner UIs for triplestores (asking for a foaf:Person? It’s likely you’d also want a foaf:name then…), so I figured I should try to generate such an SCD — Source Content Description — automagically, as Damian hints to himself: Ideally this information would mined from the store.

I’ve managed to come up with a single query that returns all the information necessary to construct an SCD, but since it’s quite complex, I’ll explain the steps I took on the way there.

Continue reading Describing Source Content for Redland/MySQL

Redland Hacking

During the last few days, I’ve been hacking a bit on — and with — Redland.

First off, I verified that a bug and associated patch from Simon Cross regarding portability of the hash calculations in the MySQL storage engine was indeed working. When originally writing the code for this I hadn’t thought of the use case of accessing a storage on a different architecture, but that is of course an important one. The issue is now closed, Dave Beckett has applied the patch to CVS.

I also created an issue regarding the design decision to not look for hash collisions, 28: Hash collisions possible in MySQL storage engine. I don’t have a solution ready for this, but I thought it would be a good idea to get it out in the open, so people are aware of the problem.

Another minor issue with the MySQL storage was its excessive use of connections, especially visible when using Rasqal. I wrote a patch to make it use persistent connections, and Christopher Schmidt was kind enough to help me test it. It seems to be working fine — it does here as well, so I sent a message to redland-dev asking for comments, hopefully this will get into CSV soon as well.

Then came a bit of work on the long-running issue with the PHP interface to Redland. PHP has its own unique NULL-value, so when the Redland Bindings blindly returned a C NULL wrapped in a regular PHP object (in the case of an error), Redland would crash Apache/PHP upon trying to use that object. In the past, Dave has been kind enough to hack a bit here and there when I ran into problems, but I decided to try to close the issue more pemanently. Thus, as explained in 15: PHP binding functions should return a PHP null, I patched the pointer return function to always return a PHP NULL instead a C one. My first version of the patch seems to have been faulty, as Dave couldn’t apply it to CVS, but I created a new one that I hope will do a better job. Also, as a side effect of this change, it is now no longer possible to pass a C NULL into some of the Redland functions where needed, so it seems we have to create a few PHP helper functions to return a C NULL wrapped in a PHP object…

I’ve got more ideas for improvement to Redland, but they really can’t be considered as anything other than feature requests to be coded on a day (and night) with nothing else to do, so I haven’t created issues for these:

  • An option for the MySQL storage to prefix table names with a constant string, to make it possible to have more than one storage in the same database, inspired by the way WordPress does it, and to help out with Dan Brickley’s SparqlPress project.
  • Some builtin “reasoning” functions, to — among other things — make my Redland Smusher obsolete. I’ve discussed this a bit with Dave, but we still haven’t figured out the “right” or best way to implement it.
  • It seems the new version of 3store will store simple datatyped literals like integers in separate columns, to make it easier for the database enginge to work with the values and to better support SPARQL. I think I’d like to do the same for the Redland MySQL storage, but still have to figure out the implications.
  • A new MySQL storage enginge that reads — later on maybe writes as well — the Jena schema layout. This could perhaps be an option to the MySQL storage enginge, in which case it would be almost trivial to also add an option for storing in a simpler, denormalized layout, where all the information is in a single table instead of spread out over four.

Last, and in some sense also least, I hacked a little conversion service, CSV-SPARQLer, that simply takes a URI to a CSV file and turns it into SPARQL Variable Bindings Results format (example, show query, extra example, show extra query).
As the extra example shows, I wanted to be able to subscribe to the action that goes on in the Redland Issue Tracker, but all it made available was a CSV file, so there: A CSV file converted into SPARQL result format, then converted into RSS through SPARQL Conversions XSLT. The resulting RSS is not perfect, notably the titles are a bit generic, but it’s good enough.