<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"><channel><title>Phil Dawes' Stuff - Latest Comments in Indexes, Hashes &amp; Compression</title><link>http://phildawesstuff.disqus.com/</link><description></description><language>en</language><lastBuildDate>Fri, 27 Jul 2007 05:27:34 -0000</lastBuildDate><item><title>Re: Indexes, Hashes &amp; Compression</title><link>http://www.phildawes.net/blog/2007/07/26/indexes-hashes-compression/#comment-2753604</link><description>@pigalle: thanks for the comments - I'll take a look at reiser4.&lt;br&gt;The ~10ms latency is for a disk seek not a read. What sort of timings are you getting?&lt;br&gt;&lt;br&gt;@Seth: Cool - I'm planning on doing the same thing (have you read the research papers for cstore?).</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Phil Dawes</dc:creator><pubDate>Fri, 27 Jul 2007 05:27:34 -0000</pubDate></item><item><title>Re: Indexes, Hashes &amp; Compression</title><link>http://www.phildawes.net/blog/2007/07/26/indexes-hashes-compression/#comment-2753603</link><description>I've had good experience storing my data in columns.  If I sort the data in each column, I'll get very good compression.</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Seth Ladd</dc:creator><pubDate>Thu, 26 Jul 2007 21:02:51 -0000</pubDate></item><item><title>Re: Indexes, Hashes &amp; Compression</title><link>http://www.phildawes.net/blog/2007/07/26/indexes-hashes-compression/#comment-2753602</link><description>re: optimal storage / read efficiency - have you tried reiser4? it does a wonderful job of not wasting disk space. a 'du -k' inside a dir used roughly the same amount of total space as a n3 serialization of the same data. said ~30 mb of data took up 230 mb on ext3. and, about one in every 5 triples is a blog post / news story text where theres a 5K chunk of text - the difference would be even more absurd if not for that. also read back is much faster than your numbers would suggest - its nowhere near 10 ms per call. what kind of drive are you using a 423 mb thing you found in a discared PC on the street?&lt;br&gt;&lt;br&gt;as for 'in memory' - the kernel disk cache is a great for 'in memory' - especially in the concurrency department - 10 mongrels can all benefit from it w/o a seperate memcached..&lt;br&gt;&lt;br&gt;as for indexing - i havent thought about it much yet - my query engine takes about 0.1 seconds for a basic 'fetch the content, title, author, date, abstract of ___ resources sorted by ascending date'.. hopefully that can be shaved down once i learn some stuff, and your previous post is my jumping off point - thanks!&lt;br&gt;&lt;br&gt;oh ya. wheres your source? mines &lt;a href="http://whats-your.name/yard" rel="nofollow"&gt;http://whats-your.name/yard&lt;/a&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">pigalle</dc:creator><pubDate>Thu, 26 Jul 2007 17:39:18 -0000</pubDate></item></channel></rss>