Phil Dawes is working on an ifpstore to back his Veudas RDF editor. Yesterday he published some benchmarks for importing triples into ifpstore.
While there are quite a number of variables — the data structures differ a bit, the hardware used isn’t identical, etc… — I thought I’d try to redo the benchmarks with the Redland/MySQL storage backend.
At first I tried importing the four separate files from Wordnet, then serialised in NTriples syntax in one file, and last a single RDF/XML file, generated naïvely (meaning that the size of the file is larger than the four raw files combined, even if it contains the same triples) from the NTriples one.
Then I repeated the tests, this time with the bulk loading features — table locking and temporary disabling of indices — of the MySQL storage backend turned on. The benchmark below includes the index rebuilding phase that takes place after the actual load.
Redland/MySQL Wordnet Benchmark | Standard | Bulk |
---|---|---|
Separate | 210 | 181 |
RDF/XML | 225 | 169 |
Ntriples | 205 | 158 |
As the numbers suggest, the bulk loading features work, and raptor is faster at parsing NTriples than RDF/XML, hardly a surprise.
What doesn’t show in the numbers is the fact that the entire load process is CPU bound, it’s not disk accesses that’s taking time. Also, the amount of triples, 473589, isn’t enough to fill up the in-memory cache MySQL maintains (here), not even importing a large dataset like Jim’s 6.7 million scuttered statements (converted into NTriples with jim2ntriples) seems to be. With bulk loading turned on, that entire process takes about 34 minutes, equivalent to about 3300 triples per second, as compared to the about 3000 triples per second for the best case above.