DISQUS

DISQUS Hello! Phil Dawes' Stuff is using DISQUS, a powerful comment system, to manage its comments. Learn more.

Community Page

Jump to original thread »
Author

More import optimisation

Started by phildawes · 9 months ago

Claire’s out tonight, so another evening spent on bulk rdf importing. Have managed to get the original 120705 statement dataset import down to 77.6 seconds - that’s ~1500 triples a second!
The extra speed was mainly due to removing the need for database URI to id look ... Continue reading »

2 comments

  • If you want to test your system with a really large data set (150M triples), have a look at http://www.isb-sib.ch/~ejain/rdf/data/ :-)

    I believe the only way to load such amounts of data within reasonable time on reasonable hardware is to make use of the underlying database's bulk loading facilities - I gather you chose a similar approach. We can load 6'000 triples per second, most of which is required for building all the indexes...
  • Hi Eric,

    When you say 6000 triples a second, is this from rdf/xml, or already parsed into some sort of optimized format?

Add New Comment

Returning? Login