Community Page
- phildawes.net/blog/ Jump to website »
-
Subscribe -
Community
-
Top Commenters
-
Popular Threads
-
Recent Comments
- Hi, Do you feel that your agility in Factor has improved since this post? Roger
- Thanks for the pointer - I've cleaned up the spam and regrettably added some moderation
- I'm loving the comments thread for this post. Can't decide whether to get my upholstery cleaned or do something about my fast food obesity.
- Cool - thanks Eric
- I pasted some code that does the moving sum in factor. http://paste.factorcode.org/paste?id=569#282
Jump to original thread »
Spent the evening working on bulk assert speed for my ifp smushing store. Managed to get a 120000 statement import down from 410 seconds to 109 seconds - that’s over 1000 triples per second. Pretty good considering the rdflib rdf/xml parsing takes 59 seconds on its own - I wonder how f
... Continue reading »
Acum 4 ani
Re imports, I'd wondered about doing this sort of think chunked into, say, 10000 triple blocks, for large imports. But yes, having a benchmarking framework would take out some of the guesswork...
Acum 4 ani
Have been thinking about chunked imports too - makes sense with the bulk-import approach since most of the time is spent in the parsing/preparing rather than the actual dumping to db. Chunking it would enable parallelization of the time consuming preparation stage. Would work best with something like 3store, which stores md5 hashes for URIs in its main triples table - no centralization required to agree IDs for URIs.
Unfortunately (from this perspective), my store uses generated IDs for URIs to enable a 1:many logical-resource -> URI mapping for smushing. Would probably need to import in chunks and then reconcile IDs in a subsequent sweep.
Acum 4 ani
If you're looking at big data it may be worth keeping one eye on developments in RFC3229 and feeds as a possible means to sync'ing big stores, see:
http://bobwyman.pubsub.com/