[PATCH] D93829: [clangd] Support outgoing calls in call hierarchy

Tue Apr 11 07:20:47 PDT 2023

sammccall added a comment.
Herald added a project: All.

Time to wake up an old review!

Going to put high-level thoughts on https://github.com/clangd/clangd/discussions/1206, but since one of them is memory usage I wanted to have a look at the concrete data structure here.

TL;DR: I think we can make this cheap enough we don't care about the extra memory. The simple lookup optimization below might be enough (probably not). Dropping non-callee refs from the data structure is likely enough, the deep lookup optimization is likely enough, doing both probably **reduces** memory usage vs today!

---

Lookup optimization (simple):

`DenseMap<SymbolID, vector<RevRef>>` is something like `56*containers + 16*refs` bytes:

- a `pair<SymbolID, vector<RevRef>, which is 4 pointers per symbol, plus extra buckets so >7 pointers if I'm reading DenseMap right
- a `RevRef` per entry which is two pointers
- maybe some allocator overhead for all those little vectors? (IDK how allocators work in practice)

Dropping the hashtable and just storing a flat `vector<RevRef>` sorted by container eliminates the per-symbol costs for `16*refs` bytes. If there were only 3-4 refs per function this would save 50%, but there's probably more. The tradeoff is binary search (with double-indirection for compare) instead of hash lookup, but still seems fast enough for what we want it for.

The elephant in the room is that most of these refs we're storing are not callees. If we were to drop those from the lookup structure as discussed inline above, then the ratio of refs:symbols goes down and this optimization becomes relatively more effective.

---

Lookup optimization (deeper): h/t @kadircet

I think it's easiest to explain as a pair of modifications to the current refs data structures (as checked-in, not the ones in this patch).

**First**: imagine we change RefSlab to produce:

- a big flat `vector<Ref>` grouped by target.
- a lookup structure consisting of a sorted `vector<pair</*Target*/SymbolID, /*Offset*/int>>`

This is smaller than our current representation: the lookup structure is `20 * targets` bytes instead of the current `~40 * targets` bytes and making the ref storage contiguous is ~free.

Clearly this supports lookup of refs for a target by binary searching over the lookup structure by target.

But observe that also given a `Ref*` we can find its target: binary search over the lookup structure searching for the right **offset**, rather than target.

**Second**: have the index compute a permutation of refs grouped by Container. Naively this is a `vector<Ref*>`, but since we made the array flat it can be `vector<int>`

Now we can look up the refs for a given container by binary searching within this permutation, as the container is stored in the Refs themselves.

For each result we have to do a second binary search in the lookup structure to get the target symbol, but we expect the #symbols per container to be small so this nesting isn't a great concern.

So the net cost vs HEAD is `4 * refs - 20 * targets` bytes, which is less than a quarter of the version above. This takes us to ~2% increase which we can eat even if we store this info for all references.

The costs here are expensive reverse lookups (I don't think we care), slightly more expensive forward lookups (also OK I think), code complexity (*mostly* localized). Obviously this is all somewhat invasive...

(We discussed an even further squeeze: we could use the same trick to avoid our current 8 bytes inside the ref for the container. I think this leads to too many levels of nested binary searches)

================
Comment at: clang-tools-extra/clangd/XRefs.cpp:1843
+    auto Kind = Callee.SymInfo.Kind;
+    if (Kind != SK::Function && Kind != SK::InstanceMethod &&
+        Kind != SK::ClassMethod && Kind != SK::StaticMethod &&
----------------
qchateau wrote:
> nridge wrote:
> > If memory usage of `Dex::RevRefs` becomes a concern, we could consider only storing the references that would pass this filter in the first place. That would trade off time spent building the reverse lookup (we'd have to do symbol lookups or keep around extra state to track the symbol kinds) for space savings.
> I've used tried this patch with the whole llvm project indexed. If I remember correctly it's something in the like of 100MB, around 5-10% même usage of clangd. As such it's not very high but it's definitely not negligible, especially for a feature that's not used very often.
> That would trade off time spent building the reverse lookup (we'd have to do symbol lookups or keep around extra state to track the symbol kinds) for space savings.

RefKind is a bitfield, so I think we could just tack on an extra `Call` bit at indexing time. Technically it's extra state, but it's cheap to compute and free to plumb around.

Then the tradeoff becomes: the index needs to know which refkinds to include in its reverse-lookup structure. In practical terms we could hard-code this, it's an ugly layering violation though. The prettiest version I can think of is to put a constant on `RefersToRequest` for the supported refkinds.

(A ridiculous answer that almost works is to build it dynamically on first use, based on the queried refkinds: this means only people who use outgoing call hierarchy would pay the memory cost! Unfortunately I think this gets too complicated when we consider threadsafety)

================
Comment at: clang-tools-extra/clangd/index/Index.h:133
+  virtual bool
+  refersTo(const RefsRequest &Req,
+           llvm::function_ref<void(const RefersToResult &)> Callback) const = 0;
----------------
qchateau wrote:
> nridge wrote:
> > Perhaps we should have a distinct request structure, for future extensibility?
> Sure, that makes sense
+1, also for clarity (we're changing the meaning of `IDs` too).

I'd also consider naming this request something more explicitly paired with `refs`, like `containedRefs`, `reverseRefs`, `outgoingRefs`. They're not great names in isolation, but I think the idea that this is a second query on the same data is worth emphasizing.

================
Comment at: clang-tools-extra/clangd/index/MemIndex.cpp:97
+      Req.Limit.getValueOr(std::numeric_limits<uint32_t>::max());
+  for (const auto &Pair : Refs) {
+    for (const auto &R : Pair.second) {
----------------
qchateau wrote:
> nridge wrote:
> > Looping over all refs seems expensive even for `MemIndex`. Perhaps we should have a reverse lookup table similar to `RevRefs` here as well?
> That's pretty fast even on my laptop IIRC, and only used for outgoing calls atm. 
MemIndex basically doesn't scale: it does fuzzyfind by iterating over all symbols. So this sounds likely never to matter in practice.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93829/new/

https://reviews.llvm.org/D93829