[PATCH] D127284: [clang-repl] Support statements on global scope in incremental mode.

Richard Smith - zygoloid via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Nov 15 23:40:07 PST 2022


rsmith added inline comments.


================
Comment at: clang/lib/AST/Decl.cpp:5264
+
+FunctionDecl *TopLevelStmtDecl::getOrConvertToFunction() {
+  if (FD)
----------------
I would hope that we can remove this. Instead, I think we can teach `CodeGen` to emit a sequence of `TopLevelStmtDecl`s directly as an LLVM IR function -- if it's not emitted anything else nor flushed its IR output since it last emitted a `TopLevelStmtDecl`, then reuse and extend the previous `Function`, otherwise create a new one. That would also allow us to make `TopLevelStmtDecl` model exactly one `Stmt`, which seems cleaner.


================
Comment at: clang/lib/Parse/ParseTentative.cpp:52-53
+  assert(getLangOpts().CPlusPlus && "Must be called for C++ only.");
+  if (DisambiguatingWithExpression) {
+    if (Tok.is(tok::identifier)) {
+      RevertingTentativeParsingAction TPA(*this);
----------------
Can we sink this into the `switch` on the token kind below?


================
Comment at: clang/lib/Parse/Parser.cpp:1033
+        !isDeclarationStatement(/*DisambiguatingWithExpression=*/true))
+      SingleDecl = ParseTopLevelStmtDecl();
+
----------------
v.g.vassilev wrote:
> There is a remaining challenge which probably could be addressed outside of this patch.
> 
> Consider this statement block:
> ```
> int i =  12;
> ++i; 
> i--;
> 
> template<typename T> struct A { };
> ```
> 
> Ideally we should model `++i; i--;` as a single `TopLevelStmtDecl` as the statement block is contiguous. That would require the creation of 2 AST nodes per block (one for the `TopLevelStmtDecl` and one for its conversion to `FunctionDecl`). This will give us also a nice property on the REPL side where the user could decide to squash multiple statements into a statement block to save on memory.
> 
> To do so, we will need to use `isDeclarationStatement` as a stop rule in `ParseTopLevelDecl`. In turn, this would mean that we should duplicate all of the switch cases described in the `ParseExternalDeclaration` function here. [We need teach  `isDeclarationStatement` everything we know about declarations, eg. it must tell us to stop when we see definition `struct A`].
> 
> The last version of this patch goes in the opposite direction, trying to minimize the code duplication (bloat?) by wrapping each global statement into a `TopLevelStmtDecl`, reusing the logic in `ParseExternalDeclaration`. However, we pay the price for 2 AST node allocations per global statement. That is a serious hit for people that want to control the parsing granularity of an interpreter.
> 
> I wonder if we can do something better hitting both requirements in some smart way I cannot see...
It seems to me that the big cost here is creating a `FunctionDecl` and all of its ancillary components; a `TopLevelStmtDecl` is pretty cheap. I don't think it should be necessary to create that `FunctionDecl` at all -- we should be able to go straight from `TopLevelStmtDecl` to an IR function like we go straight from a `VarDecl` for a global function to its initializer IR function without creating a `FunctionDecl`.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D127284/new/

https://reviews.llvm.org/D127284



More information about the cfe-commits mailing list