[PATCH] D127284: [clang-repl] Support statements on global scope in incremental mode.

Thu Sep 8 10:14:57 PDT 2022

aaron.ballman added a comment.

In D127284#3776805 <https://reviews.llvm.org/D127284#3776805>, @aaron.ballman wrote:

> In D127284#3776036 <https://reviews.llvm.org/D127284#3776036>, @rsmith wrote:
>
>> In terms of the high-level direction here, I think it would make sense to approach this by adding a full-fledged language extension to Clang to allow statements at the top level (with a `-f` flag to enable it), and then enable that extension in the interpreter. The major change that shift in viewpoint provides is that we should have explicit modeling of these top-level statements in the AST, rather than having them only transiently exist until they get passed off to the AST consumer. I think I'd want to see something like `TopLevelStmtDecl` added to explicitly model the case where we parse a statement at the top level. You can then extend the various parts of Clang that deal with global variable initializers to also handle `TopLevelStmtDecl`s.
>
> Thank you for the suggestion! I actually have a different view on this that I was thinking about last night, but it's somewhat similar to yours in that it involves a flag to opt into behavior.
>
> I don't think we should add a feature flag for this to Clang or support this functionality in Clang's AST -- it does not meet our language extension requirements (this won't be proposed to any standards body, etc). Further, I don't think Clang maintainers should have to play cognitive whack-a-mole as new statements and features are added to C and C++, wondering how they should behave if at the top level. Instead, I think it would make sense to add a flag to clang-repl so the user can decide whether they want their REPL experience to be "whole TU" or "pretend we're wrapped in a function". For users who want their experience to be both at the same time: that's an interesting idea, but it is not a C or C++ REPL experience; I think those users can go with the "whole TU" approach and write an extra line of code + some braces, as needed.
>
> The reason I'm so uncomfortable with putting this into Clang is because neither language is designed to expect this sort of behavior and the resulting code will misbehave in mysterious ways when we guess wrong and it will be very difficult to address them all. I expect some things to be easy to recognize (see the `template` keyword and you know you can't be within a function scope), other things will require a bit of token lookahead (`unsigned long long _Thread_local i = 12;` cannot appear at local scope), and still others will require looking ahead through a potentially arbitrary number of statements (like with a VLA). I think the user should know explicitly which context they're in given that it matters to the source languages (and this is before we start to think about REPL over something like ObjC/OpenCL/HLSL/etc which may have even more interesting situations to worry about).

I had a really great conversation with @v.g.vassilev off-list about my concerns and the clang-repl needs/desires (thank you for taking the time to have that chat with me!), and this is a summary of what we think makes sense as a way forward:

Technical:

- clang gets a -cc1 (only) flag (or some other kind of not-user-facing option) to enable a special mode where you can mix statements and declarations at TU scope.
- clang-repl uses that new flag.
- clang gets a new AST node for TopLevelStmtDecl that's a Decl AST node wrapping a Stmt node; it's documented as only existing to support REPL. Add an assertion somewhere sensible that we never see one of these AST nodes outside of a clang-repl context.
- clang-repl uses that new AST node as-needed.

Administrative:

- clang-repl continues to push on standardization efforts in WG21 (and potentially WG14).
  - If those efforts lead to an explicit rejection of the idea, we should discuss how to proceed at that time, but I would envision that we'd remove the new AST node and the flag to enable this functionality, and introduce interfaces allowing us to synthesize code during codegen to try to break clang-repl users as little as possible. This way we aren't eating into standards body design space that will potentially give us problems in the future if the committees elect to do something *incompatible* in this space. (Basically: in a fight between clang-repl users and a new standard feature, the new standard feature "wins".) This will break clang-repl users, so we should consider documenting the potential for this up front (basically, call it an experimental feature of clang-repl rather than promising backwards compatibility for it).
- Any failures related to TopLevelStmtDecl will require collaboration between clang-repl and clang maintainers, but should mostly be driven by clang-repl maintainers.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D127284/new/

https://reviews.llvm.org/D127284