[clang] [clang][modules] Remove preloaded SLocEntries from PCM files (PR #66962)

Jan Svoboda via cfe-commits cfe-commits at lists.llvm.org
Thu Sep 28 12:19:24 PDT 2023


jansvoboda11 wrote:

> > I changed SourceManager::isInTheSameTranslationUnit() to check for that case explicitly instead of relying on the fact/bug that built-ins buffers happened to be assigned an untranslated include SourceLocation.
> 
> What is the translated location you get here? It's hard for me to tell if the comment in `isBeforeInTranslationUnit` is describing the desired behaviour or just a consequence of how it happened to work at the time.

When a module is built, its `SourceLocation` offsets go from 0 to N. When it's imported, they are remapped into the range from 2<sup>32</sup>-N to 2<sup>32</sup>.

So the fact that we didn't remap the offset of the built-ins buffer meant it was not considered part of the imported module. This made `isInTheSameTranslationUnit(<built-ins>, module.h)` return `[false, false]` and the special case in `isBeforeInTranslationUnit()` would make sure to return `true` in order to enforce that the `<built-ins>` buffer always compares `<` to other offsets in the module.

It took me a while to figure out why that is desired. My understanding is that whether one offset is before another can't be just simple comparison of the numeric values. (More details on this are in `isInTheSameTranslationUnit()`.)

Here's a little example of how this ends up working with modules:

| entry               | ID [t] | offset    | `isBefore(?, t(250))` | `isBefore(t(250), ?)` |
|---------------------|--------|-----------|-----------------------|-----------------------|
| dummy               | 0 [-6] | <0,1)     | yes                   | no                    |
| modulemap           | 1 [-5] | <1,10)    | yes                   | no                    |
| `<module-includes>` | 2 [-4] | <10,100)  | yes                   | no                    |
| `<built-in>`        | 3 [-3] | <100,200) | no                    | yes                   |
| header              | 4 [-2] | <200,300) | yes -> no             | no -> yes             |

The `<module-includes>` buffer is parent of the header, but the `<built-in>` buffer ends up being created in between them. This causes the `isBefore` relationships to not be sorted/partitioned, as one might expect.

This is the reason the binary search in `ASTReader::findPreprocessedEntitiesInRange()` broke.

So the special case in `isBeforeInTranslationUnit()` that moves `<built-in>` before all other entries makes sense, but now needs to happen earlier, before calling `isInTheSameTranslationUnit()`. That's what I've done in the latest commit, and it actually fixes "Index/annotate-module.m". (I guess we should also consider merging both of those functions, because `isInTheSameTranslationUnit()` doesn't really stand on its own now.) Unfortunately, "SemaCXX/warn-unsafe-buffer-usage-fixits-parm-span.cpp" started failing, so I'll look into that next.

@benlangmuir, let me know if this explanation makes sense.

Also adding @sam-mccall as a reviewer, since he's worked in this area recently.

https://github.com/llvm/llvm-project/pull/66962


More information about the cfe-commits mailing list