[PATCH] D76869: [Clang] Restore replace_path_prefix instead of startswith

Adrian McCarthy via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Apr 30 17:11:18 PDT 2020


amccarth added a comment.

Thanks for including me in this discussion.

I think this can work and not interfere with hybrid VFS paths, but please make sure to test on the all VFS tests both in both LLVM and Clang.

The case blindness is incomplete as noted in the inline comments, but maybe it's "good enough."  I don't know how strictly other parts of LLVM handle this.  Doing better will be hard because you'd have to compare code point by code point rather than UTF-8 code unit by code unit.  Also note that Windows allows for case-sensitive folders as an experimental feature.  (NTFS actually is case sensitive, but the OS hides that detail.)

As noted inline, please add a path style parameter to `replace_path_prefix` and pass it along to `starts_with`.



================
Comment at: llvm/lib/Support/Path.cpp:510
+        return false;
+      if (!SepPath && toLower(Path[I]) != toLower(Prefix[I]))
+        return false;
----------------
MaskRay wrote:
> MaskRay wrote:
> > I am sure toLower works when Unicode is considered.
> > 
> > @amccarth has made many useful comments in this area before, so I added him as a blocking reviewer.
> s/am sure/am not sure/
If I recall correctly, the case insensitivity in Windows paths is not locale-aware.  In fact, NTFS has a hard-coded case table.  So that's not an issue.  Also not an issue is Unicode normalization.  I'm pretty sure you can name one file "Ä" and another "Ä" and they will remain distinct because one is a pre-composited code point and the other is a regular "A" with a combining mark.  So that's another thing you don't have to worry about.

But it looks like `toLower` only handles basic ASCII, so, yeah, Unicode path names won't benefit from this attempt at case-blind comparison.  I'm not sure what facilities LLVM has for case folding for the rest of the BMP.  Maybe this is good enough for now?  I don't have any data on how common it is for people to expect proper case-blind comparisons of non-ASCII characters in Windows paths.


================
Comment at: llvm/lib/Support/Path.cpp:524
   StringRef OrigPath(Path.begin(), Path.size());
-  if (!OrigPath.startswith(OldPrefix))
+  if (!starts_with(OrigPath, OldPrefix))
     return false;
----------------
I wonder if `replace_path_prefix` should take a path style (defaulted to native) and pass it along to starts_with.  That would give future users of `replace_path_prefix` the power to control whether they get the Windows treatment.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76869/new/

https://reviews.llvm.org/D76869





More information about the llvm-commits mailing list