[clang] Reduce memory usage in AST parent map generation by lazily checking if nodes have been seen (PR #129934)

Wed Mar 5 17:54:07 PST 2025

higher-performance wrote:

Ah sorry I forgot to reply to your review.

> We might take a look at `SetVector` and see if there are any learnings we can take from that. It has a set and a vector together, but might have some interesting things.

I did take a look at this actually, but it looked equivalent to what I was doing before, particularly with respect to storing the expensive keys (`DynTypedNode`) twice. It just uses `SmallVector<T, ...>` and `DenseSet<T>` underneath, whereas I just happened to use `SmallDenseSet<T, ...>` instead of `DenseSet<T>`.

> Finally... I wonder if we were to just store this as a sorted-vector of `DynTypedNode` + `insert order`, then do a copy & sort on access of `view`.

I don't think that works. You don't want to do a ton of work on every `view()` -- it's assumed to be cheap, and called often. Also, maintaining sort order gets expensive the more nodes you add; you don't want to do it every insertion.

Furthermore not elements without memoization data aren't comparable, but they still need to appear in the `view` *in the same order as they were inserted*. This means either they need to be in the same vector as the comparable elements (which makes it impossible to sort the comparable portion), or we need to create a façade over two completely disjoint containers.

The implementation in this PR addresses all of those constraints without any drawbacks, I think. There's less code compared to writing a facade, and performance-wise it feels just about optimal.

> Can we do some benchmarking to figure out which things are done the 'most often' with what values of `N`? it might better guide this.

Re: benchmarking -- note that I did a general benchmark both with the test case that was provided in the associated issue, as well as with an expensive TU that we had internally. It had great performance in both cases. Regarding N -- I assume N is referring to the total container sizes (per the other comment)? The thing is, I don't get the impression there's much room for improvement here. The memory usage was just high by a constant (10x) factor. Some ~5x of that almost certainly came from the data type itself (storing `DynTypedNode` in the cache which is 40 bytes, instead of `DynTypedNode*`, which is 8 bytes). Adding laziness on top of that has shrunk the difference so much that the performance plots are already very close to coinciding.

https://github.com/llvm/llvm-project/pull/129934