[PATCH] D126723: [pseudo] GC GSS nodes, reuse them with a freelist

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Jun 3 12:09:36 PDT 2022


sammccall marked an inline comment as done.
sammccall added inline comments.


================
Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:391
+                              llvm::ArrayRef<const Node *> Parents) {
+  ++NodeCount;
+  Node *Result = new (allocate(Parents.size()))
----------------
hokein wrote:
> sammccall wrote:
> > hokein wrote:
> > > The NodeCount becomes vague now, we update in both addNode and allocate methods, I think we should have two separate counts.
> > - Oops, that duplicated ++NodeCount is just a mistake.
> > - names are vague now, both "NodeCount" and "Allocated" are dubious. Maybe I'll rename to `NodesCreated` and `Alive`, WDYT?
> > - are there actually more metrics we want to record? we have nodes-created, currently-live-nodes (Allocated.size()), nodes-destroyed (`NodeCount - Allocated.size()`). We could track max-live-node but I think Arena.bytes() is fine for our purposes.
> > - do we want to expose any of those numbers through APIs? I couldn't work out what I'd actually do with them.
> > names are vague now, both "NodeCount" and "Allocated" are dubious. Maybe I'll rename to NodesCreated and Alive, WDYT?
> 
> Sounds good to me.
> 
> > are there actually more metrics we want to record? we have nodes-created, currently-live-nodes (Allocated.size()), nodes-destroyed (NodeCount - Allocated.size()). We could track max-live-node but I think Arena.bytes() is fine for our purposes.
> 
> I think these are sufficient, should provide enough details about GSS-memory-related information. The other things I will like to record is the max-active-heads, but that's a different bit.
> 
> > do we want to expose any of those numbers through APIs? I couldn't work out what I'd actually do with them.
> 
> I think we should avoid to expose an API per field, we probably just need a single API which returns a Stats struct containing all these things. We can do it later.
> 
> 
OK, I haven't added any new stats APIs for now.


================
Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:403
+  ++NodeCount;
+  if (FreeList.size() <= Parents)
+    FreeList.resize(Parents + 1);
----------------
hokein wrote:
> sammccall wrote:
> > hokein wrote:
> > > What's the practical size of the `FreeList`? I think we can just find one, and use it as initial default size, to save some cost of resize. 
> > It's whatever the max #parents is - maybe 10 or something?
> > I don't think it's worth adding a hard-coded guess to save ~1 resize of a single vector per parse!
> I would image there will be multiple resize calls, if the path like `allocate(1), allocate(3), ..., allocate(max)`, which I think it is practical? unless the first call is `allocate(max)` which is unlikely I think.
So for parsing clangd/AST.cpp, we have 4 small allocations for the whole file, which we could reduce by 3.
```
size increased to 1
capacity grew from 0 to 1
size increased to 2
capacity grew from 1 to 2
size increased to 3
capacity grew from 2 to 4
size increased to 4
size increased to 6
capacity grew from 4 to 8
size increased to 7
```

By comparison, **each call** to glrReduce creates 5 vectors and 2 priority queues, each of which can have multiple allocations as it grows. There are 4000 tokens, for a total of probably **50000** mallocs.

I think you're microoptimizing the wrong places :-)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D126723/new/

https://reviews.llvm.org/D126723



More information about the cfe-commits mailing list