[PATCH] D135220: [clang] Update ModuleMap::getModuleMapFile* to use FileEntryRef
Hans Wennborg via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Thu Oct 20 06:10:04 PDT 2022
hans added a comment.
In D135220#3862893 <https://reviews.llvm.org/D135220#3862893>, @dexonsmith wrote:
> In D135220#3862058 <https://reviews.llvm.org/D135220#3862058>, @hans wrote:
>
>> The build system folks replied saying they're not using symlinks, but hard links for compiler inputs. Dropping the `-s` flag in the repro above demonstrates this.
>
> I think this only happened to work before. If `/tmp/a.cc` changes to these file contents:
>
> int foo_table[] = {
> #include "/tmp/foo.h"
> };
> int bar_table[] = {
> #include "/tmp/bar.h"
> };
> int foo2_table[] = {
> #include "/tmp/foo.h"
> };
>
> then I'm getting the following with my system clang:
>
> # 1 "/tmp/a.cc"
> # 1 "<built-in>" 1
> # 1 "<built-in>" 3
> # 414 "<built-in>" 3
> # 1 "<command line>" 1
> # 1 "<built-in>" 2
> # 1 "/tmp/a.cc" 2
> int foo_table[] = {
> # 1 "/tmp/foo.h" 1
> 1, 2, 3
> # 3 "/tmp/a.cc" 2
> };
> int bar_table[] = {
> # 1 "/tmp/bar.h" 1
> 1, 2, 3
> # 6 "/tmp/a.cc" 2
> };
> int foo2_table[] = {
> # 1 "/tmp/bar.h" 1
> 1, 2, 3
> # 9 "/tmp/a.cc" 2
> };
>
> Note that `foo2_table` now points at `bar.h` but should point at `foo.h`. I presume after the present patch it'll point at `foo.h` (although I admit I didn't test).
Interesting, thanks!
> The root cause is the same, and matches @benlangmuir's analysis in response to the testcase failure on Windows:
>
> - The FileManager is merging files based on their observed inode (regardless of symlink or hardlink).
> - Previously, filenames were reported based on the most recent access path: "last-ref".
> - Now, they're reported based on the original access path: "first-ref".
> - In neither case is it sound.
Agreed.
> Here are a few solutions I can see:
>
> 1. Change FileManager to only merge files with matching "realpath", seeing through symlinks but not hardlinks (could use observed inode collisions as a hint to do an `lstat` to avoid running an `lstat` every time). Then SourceManager will give a different FileID to these files.
> 2. Change SourceManager to have different FileID to these files, even though they were merged by FileManager.
> 3. Change the `#include` stack to track the accessed filename (the `FileEntryRef`) in addition to the SLoc, even though they have the same FileID.
> 4. Require clients to keep contents distinct if they it matters (the (surprising) status quo).
>
> What's in tree (and has been for a long time) is (4). (1) seems like the right solution to me.
>
> As a heuristic on top of (4), "last-ref" (before this patch) probably gets the answer the user expects more often than "first-ref" (this patch) does, which I assume is why it was originally implemented. However, it's unpredictable and requires mutating a bunch of state where mutations are hard to reason about.
>
> IMO, "first-ref" is easier to make sound, since it's an immutable property, so this patch is an incremental improvement; it makes the fuzzy modeling easier to observe but also perhaps more predictable. If we care about fixing hardlink include names ("correct-ref") we should fix them in all cases (ideally with (1)), but I wonder how urgent it is.
>
> @hans, WDYT?
I feel I don't have enough background here to say whether (1) would work. Why is FileManager merging files in the first place?
This is not urgent (at least not for us), but it's a pretty surprising behavior.
One thought, which I'm not sure is relevant, is that this is only observable for headers which are included more than once, which is rare because normally there are include guards (or pragma once). `isFileMultipleIncludeGuarded` isn't tracked by the FileManager, but otherwise maybe that could have been a useful heuristic: don't merge header files without include guards?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D135220/new/
https://reviews.llvm.org/D135220
More information about the cfe-commits
mailing list