Re: Indexes, Hashes & Compression

Phil Dawes — Fri, 27 Jul 2007 05:27:34 -0000

@pigalle: thanks for the comments - I'll take a look at reiser4.
The ~10ms latency is for a disk seek not a read. What sort of timings are you getting?

@Seth: Cool - I'm planning on doing the same thing (have you read the research papers for cstore?).

Re: Indexes, Hashes & Compression

Seth Ladd — Thu, 26 Jul 2007 21:02:51 -0000

I've had good experience storing my data in columns. If I sort the data in each column, I'll get very good compression.

Re: Indexes, Hashes & Compression

pigalle — Thu, 26 Jul 2007 17:39:18 -0000

re: optimal storage / read efficiency - have you tried reiser4? it does a wonderful job of not wasting disk space. a 'du -k' inside a dir used roughly the same amount of total space as a n3 serialization of the same data. said ~30 mb of data took up 230 mb on ext3. and, about one in every 5 triples is a blog post / news story text where theres a 5K chunk of text - the difference would be even more absurd if not for that. also read back is much faster than your numbers would suggest - its nowhere near 10 ms per call. what kind of drive are you using a 423 mb thing you found in a discared PC on the street?

as for 'in memory' - the kernel disk cache is a great for 'in memory' - especially in the concurrency department - 10 mongrels can all benefit from it w/o a seperate memcached..

as for indexing - i havent thought about it much yet - my query engine takes about 0.1 seconds for a basic 'fetch the content, title, author, date, abstract of ___ resources sorted by ascending date'.. hopefully that can be shaved down once i learn some stuff, and your previous post is my jumping off point - thanks!

oh ya. wheres your source? mines http://whats-your.name/yard

Phil Dawes' Stuff - Latest Comments in Indexes, Hashes & Compression

Re: Indexes, Hashes & Compression

Re: Indexes, Hashes & Compression

Re: Indexes, Hashes & Compression