[PATCH] D154130: [lit][clang] Avoid realpath on Windows due to MAX_PATH limitations

Tom Honermann via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Aug 15 11:18:32 PDT 2023


tahonermann added inline comments.


================
Comment at: clang/test/Lexer/case-insensitive-include-win.c:5-9
+// Note: We must use the real path here, because the logic to detect case
+// mismatch relies on resolving the real path and checking that casing differs.
+// If we use %t and we are on a substitute drive S: mapping to C:\subst,
+// then we will compare "S:\test.dir\FOO.h" to "C:\subst\test.dir\foo.h"
+// and avoid emitting the diagnostic because the structure is different.
----------------
MrTrillian wrote:
> tahonermann wrote:
> > MrTrillian wrote:
> > > tahonermann wrote:
> > > > MrTrillian wrote:
> > > > > tahonermann wrote:
> > > > > > Hmm, is it really desirable or necessary that the case mismatch logic resolve substituted drives? I wouldn't think so. This seems like another case where our common canonicalization would be desired.
> > > > > I think it is necessary as I don't see how else we can retrieve the original casing of the file path. Merely making the path absolute would not do that. Searching for solutions on the internet finds ideas that are worse like converting to a short path then back to a long path or rebuilding the path one component at a time with a series directory listing requests.
> > > > The warning this test checks for is diagnosed in `Preprocessor::HandleHeaderIncludeOrImport()`; search for `pp_nonportable_path` and/or `pp_nonportable_system_path`. I believe this warning will be issued if any component of the path does not match the case of the include directive. Since the file name differs in case, unless there is a bug in handling of the file name, this test isn't sensitive to case mismatches in `%t`.
> > > > 
> > > > From a user point of view, resolving a substitute drive doesn't seem desirableto me with respect to determining whether a non-portable path is being used; I don't think the behavior should be dependent on whether a substitute drive is being used or what its target is.
> > > @tahonermann I think the code is working by design and it would be an unrelated change to modify its logic. See `trySimplifyPath` in `PPDirectives.cpp`:
> > > 
> > > ```
> > >   // Below is a best-effort to handle ".." in paths. It is admittedly
> > >   // not 100% correct in the presence of symlinks.
> > > 
> > >         // If these path components differ by more than just case, then we
> > >         // may be looking at symlinked paths. Bail on this diagnostic to avoid
> > >         // noisy false positives.
> > > ```
> > > 
> > > The test was previously implicitly requiring getting the realpath, and it worked because `lit.py`'s logic of doing that. Now that requirement is explicit in the test.
> > I'm still not following here. Are you saying that `trySimplifyPath()` will replace substitute drives? If so, that doesn't sound desirable and I would expect it to be problematic for your use case. I think we should follow this up ... somewhere (now that this review is closed).
> @tahonermann . `trySimplifyPath()` does not replace substitute drives. It's a best-effort attempt to see if the included path mismatches the real file path only by case. It explicitly bails out without diagnostics if it finds that the included path has a different shape from the real file path, which will happen if the included path is on a substitute drive. It has to compare with the real file path because this is the only reasonable way to get the original path casing.
> 
> It was already the case that this diagnostic bailed out in the presence of symbolic links, so there are no behavioral differences. I needed to update the test because previously `lit.py` would enforce that `%t` was a real path, and now it doesn't, which means that we would hit the "bail out" code path and not produce the diagnostic.
I think `trySimplifyPath()` is not particularly relevant as it just performs a simple canonicalization (removal of `.` and `..` path components without regard for symlink traversal) and case insensitive comparison with the remaining components with a "real" path that is presumed to already be devoid of such components. It is therefore sensitive to structure, but only for the (non-`.` and non-`..`) components present in the (non-real) `Components` vector; the "real" path may have more leading components (the `Components` vector may represent a relative path). The presence of a substitute drive in the "real" path won't contribute to a structural difference unless the `Components` vector is absolute but with a drive other than the substitute drive or if it is relative but starting at a higher directory level than the substitute drive; neither of which should be the case for this test when `%t` is consistently used.

The only relevant user path provided to Clang is the argument to the `/FI` argument; previously `\\?\%t.dir\FOO.h`. In that case, `%t` should reflect use of a substitute drive. The question I have then is why/where Clang is replacing the substitute drive. I'm guessing (I haven't actually debugged) is that this is happening in `FileManager::fillRealPathName()`; this function assigns the path returned by `FileEntry::tryGetRealPathName()`. I'm curious if that only happens if `/FI` specifies a UNC path; does the test pass if the UNC prefix is dropped and the path with the substitute drive is used? Could you try that, please? `/FI %t.dir\FOO.h`. If that still passes, then I think there is a question of whether it is desirable for behavior to differ for `\\?\` prefixed paths.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154130/new/

https://reviews.llvm.org/D154130



More information about the cfe-commits mailing list