[PATCH] D136539: [Lex] Bring back the magic number 50 in updateConsecutiveMacroArgTokens.

Haojian Wu via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Oct 25 12:54:48 PDT 2022


hokein added a comment.

In D136539#3882922 <https://reviews.llvm.org/D136539#3882922>, @sammccall wrote:

> In D136539#3882316 <https://reviews.llvm.org/D136539#3882316>, @hokein wrote:
>
>>> This is a subtle change that needs careful review, and honestly should have a test.
>>
>> I agree, and thanks for the nice review comments!
>>
>> I'd like to add a unittest, but I don't see a straight way (I manually test it by comparing the number of allocated SLocEntries and the used SourceLocation address space in `clang --print-stats` with/out this patch), some options:
>>
>> 1. construct a large TU that will make clang fails to compile without the 50 trick patch;
>> 2. create a small test, and verify the the source location is split when the two consecutive tokens distance > 50, and verify the number of  in `clang print-stats`;
>>
>> 1 feels better to reflect the real behavior, the only concern is that this is a stress test, running it is expensive (we also have a similar `SourceLocationsOverlow.c`, I guess it is ok to add it).
>
> I was all ready to say that option 2 is the only practical one, and was surprised that you had an actual testcase.
> I'm a bit nervous (it must be slow and eat tons of ram) but if we already have a similar test...

now I change my mind -- this test is very expensive, taking ~8s to run (which is 4x slower than the `SourceLocationsOverlow.c`), adding it is not a good idea. Switched to 2.



================
Comment at: clang/lib/Lex/TokenLexer.cpp:1010
+        unsigned distance =
+            T.getLocation().getRawEncoding() - LastLoc.getRawEncoding();
+        LastLoc = T.getLocation();
----------------
sammccall wrote:
> hokein wrote:
> > sammccall wrote:
> > > sammccall wrote:
> > > > This seems to be missing the same-sloc-address-space check: it can create a single file ID spanning a local and loaded sloc entry, which will be corrupted by saving+loading
> > > > 
> > > (getOffset() would be much clearly correct than getRawEncoding())
> > > missing the same-sloc-address-space check
> > 
> > oops, good spot. Indeed this is another behavior change in the previous-rewriting patch :( -- the original behavior was that if two consecutive tokens are not in the same address space, it split to two expansion SLocEntries.
> > 
> > Fixed.
> > oops, good spot. Indeed this is another behavior change in the previous-rewriting patch :( 
> 
> Argh, I'm so sorry, this prompted me to take another look and there is no bug/need for the check at all.
> 
> Before we do the NearLast check we've already established that the FileID is the same, which means the locations are necessarily in the same address space.
> 
> We can replace the isInSameSLocAddrSpace with a raw offset comparison again...
hmm, I think there was some misunderstanding here -- I thought you meant a corner case where Last and T are in the same file, but Last.offset < `CurrentLoadedOffset`, and T.offset > `CurrentLoadedOffset`. For this case, there is a behavior change between both versions. 

I'm not sure whether this case is possible or it is reasonable to handle it, since it is a running-out-of-sourcelocation-space case (clang's behavior is somewhat inconsistent, in CreateFileID, we emit a too-large diagnostic, in `createExpansionLocImpl`, it is just an assertion), maybe we should not worry about this case, treat it as UB.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136539/new/

https://reviews.llvm.org/D136539



More information about the cfe-commits mailing list