[cfe-dev] case-insensitive #include warning
Michael Spencer via cfe-dev
cfe-dev at lists.llvm.org
Thu Apr 7 16:37:31 PDT 2016
On Thu, Apr 7, 2016 at 4:07 PM, John Sully via cfe-dev
<cfe-dev at lists.llvm.org> wrote:
> A safer way is to use an OS API that gives you back the exact name of the
> file. On windows GetFinalPathNameByHandle should work, I don't know mac
> well enough to suggest an API there. At that point you could do a case
> sensitive string compare on the name and warn if there are differences.
> This also prevents all the problems of locale specifics since tolower means
> different things in different locales. So your warning could work fine in
> en-us but not in some other country. Anything relying on tolower/toupper is
> guaranteed to have internationalization bugs.
Internationalization isn't actually a problem here. The issue is that
some file systems/OSs change the paths you give them and some don't.
If the path you get from directory iteration is memcmp different from
the path used to #include the file, then it isn't going to work on a
"normal" Linux setup. An additional wrinkle is added because this
implementation is also resolving symlinks, and so has to handle that
- Michael Spencer
> On Thu, Apr 7, 2016 at 2:54 PM, Eric Niebler <eniebler at fb.com> wrote:
>> I can say that this almost certainly does not work for non-ascii since it
>> just uses StringRef::equals_lower. Is there any proper locale-sensitive
>> string comparison routines in llvm that I can use?
>> On 4/7/16, 11:49 AM, "John Sully" <john at csquare.ca> wrote:
>> Out of curiosity have you tried this with some of the more interesting
>> upper/lower case pairs like the turkish 'İ'?
>> It sounds like the way you're achieving this should allow this to work,
>> but its worthwhile to try it.
>> On Thu, Apr 7, 2016 at 11:37 AM, Chris Lattner via cfe-dev
>> <cfe-dev at lists.llvm.org> wrote:
>>> On Apr 5, 2016, at 4:03 PM, Eric Niebler via cfe-dev
>>> <cfe-dev at lists.llvm.org> wrote:
>>> Hi all,
>>> I have an initial cut at patch that issues a warning when a file is
>>> #included on a case-insensitive file system that would not be found on a
>>> case-sensitive system. Is there interest?
>>> Since this is a hard problem to solve perfectly, I have opted for a
>>> strategy that is efficient and conservative about issuing diagnostics. That
>>> is, it shouldn't give false positives, but it will fail to diagnose some
>>> non-portable paths. On *nix systems, the low-level APIs that stat and open
>>> files get an extra call to ::realpath, and the result is cached along with
>>> the rest of the file metadata. On Windows, I use a combination of
>>> GetFullPathName and GetLongPathName to get the same effect. (I don't believe
>>> that's guaranteed to get the physical name including case, but it seems to
>>> mostly work in my testing.)
>>> Due to how I compare path components, a relative path like
>>> "NoTtHeRiGhTcAsE/../correctly-cased.h" will not be diagnosed, but
>>> "../NoTtHeRiGhTcAsE/correctly-cased.h" will be. Catching more cases requires
>>> many more round trips to the disk, which I wanted to avoid.
>>> Hi Eric,
>>> This would be a hugely welcomed feature, but have you done any
>>> performance analysis of this? The preprocessor and the data structures you
>>> are touching are very sensitive.
>>> You can stress test the preprocessor by using the "clang -cc1 -Eonly”
>>> mode. If you’re on a mac, I’d recommend timing <Cocoa/Cocoa.h>
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
More information about the cfe-dev