[PATCH] D128297: [pseudo] Track heads as GSS nodes, rather than as "pending actions".

Wed Jun 22 04:28:31 PDT 2022

hokein added a comment.

Thanks, this looks a reasonable change to me.

================
Comment at: clang-tools-extra/pseudo/include/clang-pseudo/GLR.h:140
+              const ForestNode &NextTok, const ParseParams &Params,
+              std::vector<const GSS::Node *> &NewHeads);
+// Applies available reductions on Heads, appending resulting heads to the list.
----------------
nit: change NewHeads to a pointer? it seems clearer that NewHeads is the output of the function from the caller side `glrShift(OldHeads, ..., &NewHeads)`.

I think it would be clearer if glrShift just returns NewHeads, but I understand we want to avoid temporary object for performance reasons. 

================
Comment at: clang-tools-extra/pseudo/include/clang-pseudo/grammar/LRTable.h:132
   StateID getGoToState(StateID State, SymbolID Nonterminal) const;
+  llvm::Optional<StateID> getShiftState(StateID State, SymbolID Terminal) const;

----------------
the function signature (return type) of `getGotoState` and `getShiftState` is not symmetry, it is fine and by design (we never all getGoToState on an invalid Nonterminal, it is guaranteed by the LR parser, but we might call getShiftState on a dead head). It is probably worth a comment explicitly specifying the difference.

================
Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:67
+                   G.symbolName(Terminals[I].symbol()), Terminals[I].symbol()));
+    glrShift(Heads, Terminals[I], Params, NextHeads);
+    std::swap(Heads, NextHeads);
----------------
I can see the benefit of keep them tight in the loop (and doing a shift before reduce), but I found the code is a bit confusing, it took me a while to understand.

- the concept `Heads` and `NextHeads` are different for glrShift, and glrReduce -- The NextHeads returned by the glrShift should be the Heads used in glrReduce, so I was confused when reading `glrReduce(Heads)` the code at first glance -- until I realized that there is a `swap(Heads, NextHeads)` on L68...
- it seems a little weird that at the end of the loop, we do a cleanup on NextHeads directly (I would image there is a `swap(Head, NextHead)` and then cleanup)

Instead of naming the two lists as `Heads` and `NewHeads`, how about naming them `HeadsForShift` (or `ShiftHeads`) and `HeadsForReduce`? the code would look like

```
for ( ... ) {
  glrShift(HeadsForShift, ..., &HeadsForReduce);
  ....
  glrReduce(HeadsForReduce, ..., &HeadsForShift);
}
```

It looks clearer to me.

================
Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:69
+    std::swap(Heads, NextHeads);
+    SymbolID Lookahead = I + 1 == Terminals.size() ? tokenSymbol(tok::eof)
+                                                   : Terminals[I + 1].symbol();
----------------
I'd probably leave a blank line deliberately after glrShift, because the glrReduce work on the *next* I+1 token.

================
Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:111
+              const ForestNode &NewTok, const ParseParams &Params,
+              std::vector<const GSS::Node *> &NewHeads) {
   assert(NewTok.kind() == ForestNode::Terminal);
----------------
nit: add a `NewHeads.empty` assertion? 

================
Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:200
 //     0--5(pointer)       // 5 is goto(0, pointer)
-void glrReduce(std::vector<ParseStep> &PendingReduce, const ParseParams &Params,
-               NewHeadCallback NewHeadCB) {
+void glrReduce(std::vector<const GSS::Node *> &Heads, SymbolID Lookahead,
+               const ParseParams &Params) {
----------------
IIRC, in glrReduce, we only append newly-created GSS nodes in Heads, and never to do deleting, so the Heads will end up with lots of unnecessary heads (heads created for reduces), and we will call `getShiftState` on them. 

We know that after glrReduce, active heads are heads with available shift actions, so we can optimize it further, we could just use a vector<GSS::Nodes> which just contains active heads , this could be done in the popPending. I think it will increase the performance by reducing the number of unnecessary heads.

================
Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:264
+  // PoppedHeads is our position within it.
+  unsigned PoppedHeads = 0;
   // Pop walks up the parent chain(s) for a reduction from Head by to Rule.
----------------
might be name it PoppedHeadIndex?

================
Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:360

       // Invoking the callback for new heads, a real GLR parser may add new
       // reduces to the PendingReduce queue!
----------------
the comment is stale now.

================
Comment at: clang-tools-extra/pseudo/lib/grammar/LRTable.cpp:78
+  assert(pseudo::isToken(Terminal) && "expected terminal symbol!");
+  for (const auto &Result : find(State, Terminal))
+    if (Result.kind() == Action::Shift)
----------------
instead of using find directly, just use `getActions()`.

================
Comment at: clang-tools-extra/pseudo/lib/grammar/LRTable.cpp:79
+  for (const auto &Result : find(State, Terminal))
+    if (Result.kind() == Action::Shift)
+      return Result.getShiftState();
----------------
nit: it is worth a comment saying that if there is a shift action, it must be exactly 1, this is guaranteed by the LR parser (no shift-shift conflict)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128297/new/

https://reviews.llvm.org/D128297