[PATCH] D59407: [clangd] Add RelationSlab

Thu Mar 28 11:11:04 PDT 2019

sammccall added a comment.

(Sorry to arrive late at this discussion, I'm just back from leave.
I have some suggestions and would appreciate your thoughts, but if simply this feels too much like going around in circles I'm happy to work out how we can address this after this patch lands instead.
Have discussed offline with @gribozavr and I think we're roughly on the same page. @kadircet is now out on leave)

I think the data model might be overly complicated here.
I can see how the discussion got here: it mirrors the other `*Slab` types quite closely. But:

- those types were designed to solve specific problems that we don't have here (string deduplication and symbol merging)
- I think the class-parent relation is pretty sparse (maybe 1 edge per 10 symbols?) so lots of options will work fine
- I don't think we yet know what the more resource-critical (denser) relations and queries are, so it's unclear what to optimize for

I think the simplest model that can work here is something like:

- `Relation` is `struct { SymbolID Subject; SymbolRole Relation; SymbolID Object }`
- this is a value type, so could be passed around simply as `std::vector<Relation>`. If `RelationSlab` is desired for symmetry, it could be an alias or simple wrapper around `std::vector<Relation>`.

This has a few advantages:

- the `Relation` value_type is self-contained, has clearer semantics, and works as the result of any type of query: based on subject and/or predicate and/or object. (Similar to how it's convenient that Symbol contains SymbolID).
- if lookup is desired, lookup by subject, subject+predicate, or full-triple is possible by sorting in SPO order with binary search. Return type is simple `ArrayRef<Relation>`. (Though fancy lookup maybe better belongs in MemIndex/Dex rather than in the slab). For more query types, storing a copies in OSP and/or POS order is pretty cheap too.
- memory efficiency: it costs `20*distinct(subject, predicate, object)` bytes, vs `28*distinct(subject, predicate) + 8 * distinct(subject, predicate, object)`. Unless the average number of objects for a `(subject, predicate)` pair (that has at least one object) is >2.3, the former is smaller. For the current use case, this average is certainly <1.1.
- simplicity: no arenas, easy mental model.

I see discussion of (something like) this option stalled around the index constructors, but I don't see a fundamental block there. The concept that the index constructor should be templated over would be `Iterable<Relation>`. LMK if I'm missing something.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59407/new/

https://reviews.llvm.org/D59407