[clang] Reduce memory usage in AST parent map generation by lazily checking if nodes have been seen (PR #129934)

Erich Keane via cfe-commits cfe-commits at lists.llvm.org
Wed Mar 5 14:28:20 PST 2025


================
@@ -70,16 +93,37 @@ class ParentMapContext::ParentMap {
         push_back(Value);
     }
     bool contains(const DynTypedNode &Value) {
-      return Seen.contains(Value);
+      assert(Value.getMemoizationData());
+      bool found = FragileLazySeenCache.contains(&Value);
+      while (!found && ItemsProcessed < Items.size()) {
+        found |= FragileLazySeenCache.insert(&Items[ItemsProcessed]).second;
+        ++ItemsProcessed;
+      }
+      return found;
     }
     void push_back(const DynTypedNode &Value) {
-      if (!Value.getMemoizationData() || Seen.insert(Value).second)
+      if (!Value.getMemoizationData() || !contains(Value)) {
----------------
erichkeane wrote:

It seems unfortunate to potentially re-calculate in `contains` here to just potentially immediately re-calculate it.  I wonder if there is value to special-case `ItemsProcessed == 0 && Items.capacity() == Items.size()` to just do a linear search? 

Also-also--- I wonder if it makes sense to only do the cache when `size()` is significant enough?  It would be interesting to see if there is a value of N under which the set doesn't really make sense.


https://github.com/llvm/llvm-project/pull/129934


More information about the cfe-commits mailing list