DISQUS

DISQUS Hello! Phil Dawes' Stuff is using DISQUS, a powerful comment system, to manage its comments. Learn more.

Community Page

Jump to original thread »
Author

Some ideas for static triple indexing

Started by phildawes · 9 months ago

I wrote a bit about representing structured data in the last post. Here’s some ideas for how I plan to index the data.
Indexing graphs as subject ranges
In indexing triples I need to provide indexed lookups for all 6 of the possible triple query patterns:
s->po
sp->o
p-% ... Continue reading »

4 comments

  • Your subject-id-range context trick doesn't sound like it has the same capabilities as other context systems. Does yours have a way to store 's1 p1 o1' in one context and 's1 p2 o2' in another? Or do you have to make some other statement to link the ids of the first s1 and the second s1?
  • Good point about the even distribution of the hashes. I'll have to remember that one. :)

    Didn't you say that at least the subject will be a sequential identifier, though, and so not susceptible to that optimisation?

    How many actual indexes will you need to efficiently support that set of queries, given your heirarchial index structure? Only 3?
  • Hi Drew,

    The latter. Subject identifiers aren't exposed to the client so there's no way to make statements using them specificially. Instead to join data from two subjects in different graphs you must use identity by discription (i.e. the subject that has these property values..) and the person/agent doing the query must know about them.
    Internally the subject IDs can be in the 'object' position to support things like containment. E.g. the XML:

    <pre>
    <person>
    <name>Phil Dawes</name>
    <email>phil@example.com</email>
    <knows>
    <person>
    <name>Steve</name>
    <email>s@example.com</email>
    </person>
    </knows>
    </person>
    </pre>

    Internally indexed as:
    <pre>
    #1 name "Phil Dawes"
    #1 tag Person
    #1 knows #2
    #2 name Steve
    #2 email steve@example.com
    #2 tag Person
    </pre>

    but externally you can't refer to them. Does that make sense?
  • Hi Nick,

    Opaque subject identifiers are even easier to index because they can be picked to be sequential in the index. I.e. subject 3 is at position 3.

    Re. number of indexes: I think I'll need at least the following.

    s->p->o
    p->o->s
    o->s->p

    So 3 index hierarchies for searches. The subject-id-in-the-object-position mentioned above is a special case, and will probably require its own (relatively small) index o->sp.

Add New Comment

Returning? Login