[PATCH] D122179: Serialize PragmaAssumeNonNullLoc to support preambles
David Goldman via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Thu Mar 24 11:01:00 PDT 2022
dgoldman added inline comments.
================
Comment at: clang/lib/Lex/PPLexerChange.cpp:436
+ if (PragmaAssumeNonNullLoc.isValid() && !this->PPOpts->GeneratePreamble &&
+ !(CurLexer && CurLexer->getFileID() == PredefinesFileID) &&
!isEndOfMacro && !(CurLexer && CurLexer->Is_PragmaLexer)) {
----------------
sammccall wrote:
> dgoldman wrote:
> > sammccall wrote:
> > > sammccall wrote:
> > > > dgoldman wrote:
> > > > > sammccall wrote:
> > > > > > dgoldman wrote:
> > > > > > > sammccall wrote:
> > > > > > > > what's the PredefinesFileID special case about?
> > > > > > > See ExitedFromPredefinesFile below, this is how I check if we're falling off the end of the preamble region into the main file (same as done for the conditional stack), LMK if there's a better way.
> > > > > > I don't understand how this checks if we're falling off the preamble region, and the code around ExitedFromPredefinesFile doesn't clarify this for me. Can you explain?
> > > > > >
> > > > > > The `if (ExitedFromPredefinesFile)` appears to be handling the logical *insertion* of preamble PP events when *consuming* a preamble, which is not what you're trying to do here.
> > > > > >
> > > > > > The condition here is of the form "if we have an open pragma, and we're not generating a preamble, and some other stuff, then diagnose". So if the baseline case is "diagnose" and the preamble case is an exception, the "other stuff" clauses don't limit the preamble exception, they add extra exceptions!
> > > > > `!(CurLexer && CurLexer->getFileID() == PredefinesFileID)` makes sure that if we're consuming the preamble, we don't emit the warning. AFAICT this is the way to tell that the preamble is terminated, since the current file ID being processed is the predefines file and the file is now terminated as of this method call (since !isEndOfMacro && !(CurLexer && CurLexer->Is_PragmaLexer)). `!this->PPOpts->GeneratePreamble` makes sure that if we're generating the preamble, we don't emit the warning. We need to special case both cases otherwise we get an error when generating the preamble or when we load the preamble before even processing the rest of the main file.
> > > > >
> > > > > Does that makes sense?
> > > > > !(CurLexer && CurLexer->getFileID() == PredefinesFileID) makes sure that if we're consuming the preamble, we don't emit the warning
> > > >
> > > > 1) if we're consuming the preamble, what exactly is the sequence of events that would otherwise lead us to emit the warning?
> > > > 2) what if we're in the predefines file for some other reason instead?
> > > >
> > > > I'm hoping you'll explain to me what you think the predefines file is, and what its relationship to the preamble is, and so why this condition is correct :-)
> > > >
> > > > My understanding (which is pretty shaky!) is that the predefines file is a kind of catchall used to inject text into the preprocessor that doesn't appear in the source file - that it contains definitions of builtin integer types, macros defined by `-D` on the command line, and so on. If it has any relationship to the preamble, it's something subtle that's to do with the relative order in which the preprocessor sees entities like the predefines, the preprocesor, and the main file.
> > > > So according to my understanding, interpreting "reaching EOF in the predefines file" as "consuming a preamble" is either wrong or something very subtle requiring a significant comment.
> > > >
> > > > The code you refer to below is doing something very different: it's saying that if we have a preamble, then reaching the end of the predefines file is the *trigger* to inject state from it!
> > > >
> > > > > !this->PPOpts->GeneratePreamble makes sure that if we're generating the preamble, we don't emit the warning
> > > >
> > > > Sure, but that doesn't sound correct at all! A preamble mostly consists of parsing a lot of headers, and if any of those headers have an unpaired pragma, we should be warning on that at EOF. It's only if we hit the "pretend" EOF from the truncated main file that we want to suppress the warning.
> > > > the preprocessor sees entities like the predefines, the preprocesor, and the main file.
> > > Oops... sees the predefines, any injected events from the preamble, and the main file.
> > >
> > I think that definition is correct, but given that there is only one predefines file, when it ends, it must go into the main file, no? And if we have a pragma assume nonnull loc, it shouldn't be from the clang builtins, it should be from the preamble (I'd certainly hope!)
> >
> > re: headers, I think there might be a misunderstanding, as soon as an #include is seen the pragma is ended here: https://github.com/llvm/llvm-project/blob/7631c366c8589dda488cb7ff1df26cc134002208/clang/lib/Lex/PPDirectives.cpp#L2006, so if PragmaAssumeNonNullLoc is valid here we haven't had any includes since the pragma was seen.
> > I think that definition is correct, but given that there is only one predefines file, when it ends, it must go into the main file, no?
>
> I'm sorry, but i really can't follow what you're asking, or how it relates to the original problem, or what the problem even is. If *you're* not sure, it's probably worth tracing execution of the preprocessor and describing the precise sequence of events that cause a problem. And then trying to come up with a *direct* way of solving them.
>
> Here's my *guess* as to what's going on (but please check!):
> - the assume_nonnull state is very sensitive to shifts between files - it's supposed to be off before reaching EOF and before #include. (This is why it can appear to be file-local syntax but be implemented as a global flag).
> - therefore the AssumeNonnullLoc is not just some property of the preprocessor, it's a transient state that should be set at precisely the right point in the program. However you're setting it at the very beginning, as a side effect of ReadASTBlock().
> - when consuming a preamble, the preprocessor simulates a structure like:
> ```
> // AssumeNonnullLoc is being set here
> #include <predefines> // implicit
> #inject pp conditional stack
> // AssumeNonnnullLoc *should* be set here
> // now continue with the post-preamble part of main file
> ```
> - because AssumeNonnullLoc is set too early, the PP events in between these locations mess with it. Fortunately that is probably only the predefines, although that's a brittle assumption to make. And we don't run into problems when entering the predefines file, because `HandleHeaderIncludeOrImport` isn't called, we just enter the file directly. However you do run into issues when exiting that file again, and so a special case works around those.
>
> If this is right, it's not a good solution. There are other observable effects of setting AssumeNonnullLoc in the wrong place. All the predefines will be parsed in "assume nonnull mode". And if any files are #included from the predefines file, then the enter/exit events will cause problems. Predefines may contain `#include` with the `-include` and `-imacros` clang flags, and due to clangd's preamble patching.
>
> In this case the right fix is to set AssumeNonnullLoc at the correct point in the file: triggered by exiting the predefines, along with the PP conditional stack. The way to do this is probably to have a second property on preprocessor which stores the loc that will be restored when the trigger occurs. (Similar to how PP distinguishes between the *current* conditional stack and the replayable one set by ASTReader when it calls `PP.setReplayablePreambleConditionalStack(ConditionalStack, SkipInfo);`
>
> > re: headers, I think there might be a misunderstanding, as soon as an #include is seen the pragma is ended here
>
> The scenario is:
>
> We are building a preamble from main.c
> main.c:
> ```
> #include "foo.h"
> int x;
> ```
> foo.h:
> ```
> #pragma clang assume_nonnull begin
> void foo(int*);
> // missing end pragma
> ```
> We should issue a diagnostic when we hit the end of foo.h, but with the code you have here it will be supposed instead. (There's no #include in scope of the pragma here, so I don't think that's relevant)
🤦 Thanks, that example is super helpful, not sure why I wasn't able to understand it earlier. Yeah, this doesn't properly handle EOF for headers during preamble generation and we should only set the assume_nonnull loc only when transitioning, not earlier. Like you said, we need to basically mirror the replay/record logic for the preamble conditional stack for it to work properly. I went ahead and implemented the approach that you mentioned to restore the value when exiting from the predefines as well as added a test to clangd for the edge case you gave. PTAL, I kept the storage separate but I wonder if it would be better to merge in with the existing replay logic.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D122179/new/
https://reviews.llvm.org/D122179
More information about the cfe-commits
mailing list