[PATCH] D134685: Fix SourceManager::isBeforeInTranslationUnit bug with token-pasting

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Sep 26 16:59:54 PDT 2022


sammccall created this revision.
sammccall added reviewers: ilya-biryukov, hokein.
Herald added subscribers: kadircet, arphaman.
Herald added a project: All.
sammccall requested review of this revision.
Herald added projects: clang, clang-tools-extra.
Herald added a subscriber: cfe-commits.

isBeforeInTranslationUnit compares SourceLocations across FileIDs by
mapping them onto a common ancestor file, following include/expansion edges.

It is possible to get a tie in the common ancestor, because multiple
"chunks" of a macro arg will expand to the same macro param token in the body:

  #define ID(X) X
  #define TWO 2
  ID(1 TWO)

Here two FileIDs both expand into `X` in ID's expansion:

- one containing `1` and spelled on line 3
- one containing `2` and spelled by the macro expansion of TWO

isBeforeInTranslationUnit breaks this tie by comparing the two FileIDs:
the one "on the left" is always created first and is numerically smaller.
This seems correct so far.

Prior to this patch it also takes a shortcut (unclear if intentionally).
Instead of comparing the two FileIDs that directly expand to the same location,
it compares the original FileIDs being compared. These may not be the
same if there are multiple macro expansions in between.
This *almost* always yields the right answer, because macro expansion
yields "trees" of FileIDs allocated in a contiguous range: when comparing tree A
to tree B, it doesn't matter what representative you pick.

However, the splitting of >> tokens is modeled as macro expansion (as if
the first '>' was a macro that expands to a '>' spelled a scratch buffer).
This splitting occurs retroactively when parsing, so the FileID allocated is
larger than expected if it were a real macro expansion performed during lexing.
As a result, macro tree A can be on the left of tree B, and yet contain
a token-split FileID whose numeric value is *greator* than those in B.
In this case the tiebreak gives the wrong answer.

Concretely:

  #define ID(X) X
  template <typename> class S{};
  ID(
    ID(S<S<int>> x);
    int y;
  )
  
  Given Greater = (typeloc of S<int>).getEndLoc();
        Y       = (decl of y).getLocation();
  isBeforeInTranslationUnit(Greater, Y) should return true, but returns false.

Here the common FileID of (Greater, Y) is the body of the outer ID
expansion, and they both expand to X within it.
With the current tiebreak rules, we compare the FileID of Greater (a split)
to the FileID of Y (a macro arg expansion into X of the outer ID).
The former is larger because the token split occurred relatively late.

This patch fixes the issue by removing the shortcut. It tracks the immediate
FileIDs used to reach the common file, and uses these IDs to break ties.
In the example, we now compare the macro arg expansion of the inner ID()
to the macro arg expansion of Y, and find that it is smaller.

This requires some changes to the InBeforeInTUCacheEntry (sic).
We store a little more data so it's probably slightly slower.
It was difficult to resist more invasive changes:

- performance: the sizing is very suspicious, and once the cache "fills up" we're thrashing a single entry
- API: the class seems to be needlessly complicated

However I tried to avoid mixing these with subtle behavior changes, and
will send a followup instead.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D134685

Files:
  clang-tools-extra/clangd/unittests/SelectionTests.cpp
  clang/include/clang/Basic/SourceManager.h
  clang/lib/Basic/SourceManager.cpp
  clang/unittests/Basic/SourceManagerTest.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D134685.463053.patch
Type: text/x-patch
Size: 12004 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20220926/37cbab32/attachment-0001.bin>


More information about the cfe-commits mailing list