[cfe-dev] case-insensitive #include warning

Michael Spencer via cfe-dev cfe-dev at lists.llvm.org
Thu Apr 7 16:38:30 PDT 2016


On Thu, Apr 7, 2016 at 4:32 PM, Eric Niebler via cfe-dev
<cfe-dev at lists.llvm.org> wrote:
> John,
>
> The strategy you describe is basically the strategy I'm using. I get the
> actual name of the file on disk when opening it for the first type and stuff
> the result in the File cache. Then when processing the #include, I check the
> components of the user's #include path with the actual path on disk. If any
> path component differs from the corresponding component in the "real" path,
> the warning is triggered -- unless they differ by more than case. In that
> case, I assume it's a simlink and conservatively don't issue the diagnostic.
> That last bit is the only place where I use equals_lower. This would yield a
> false positives only when (a) two path components are different according to
> StringRef::equal and (b) they are the same according to
> StringRef::equal_lower. That /seems/ unlikely to me, but I'm not positive.
> It seems more likely that this will fail to diagnose some incorrectly cased
> paths. Regardless, a locale-sensitive string compare routine sure would be a
> nice thing to have here.
>
> A #include like <NoTtHeRiGhTcAsE/../correctly-cased.h> is not flagged since
> NoTtHeRiGhTcAsE would probably not show up in the "real" path to
> correctly-cased.h as reported by the OS. So I have nothing to compare
> NoTtHeRiGhTcAsE against, and the error is not diagnosed.
>
> Re GetFinalPathNameByHandle, that's a new(er) Windows API. Vista and 2008
> Server only. I was reluctant to use it since I don't know if LLVM targets
> older OSes. I'm happy to be wrong about that.
>
> Eric
>

LLVM has dropped XP support for running the compiler (not for targeting).

- Michael Spencer

>
>
> On 4/7/16, 4:07 PM, "John Sully" <john at csquare.ca> wrote:
>
> A safer way is to use an OS API that gives you back the exact name of the
> file.  On windows GetFinalPathNameByHandle should work, I don't know mac
> well enough to suggest an API there.  At that point you could do a case
> sensitive string compare on the name and warn if there are differences.
>
> This also prevents all the problems of locale specifics since tolower means
> different things in different locales.  So your warning could work fine in
> en-us but not in some other country.  Anything relying on tolower/toupper is
> guaranteed to have internationalization bugs.
>
> On Thu, Apr 7, 2016 at 2:54 PM, Eric Niebler <eniebler at fb.com> wrote:
>>
>> I can say that this almost certainly does not work for non-ascii since it
>> just uses StringRef::equals_lower. Is there any proper locale-sensitive
>> string comparison routines in llvm that I can use?
>>
>> Eric
>>
>>
>> On 4/7/16, 11:49 AM, "John Sully" <john at csquare.ca> wrote:
>>
>> Out of curiosity have you tried this with some of the more interesting
>> upper/lower case pairs like the turkish 'İ'?
>>
>> It sounds like the way you're achieving this should allow this to work,
>> but its worthwhile to try it.
>>
>> On Thu, Apr 7, 2016 at 11:37 AM, Chris Lattner via cfe-dev
>> <cfe-dev at lists.llvm.org> wrote:
>>>
>>>
>>> On Apr 5, 2016, at 4:03 PM, Eric Niebler via cfe-dev
>>> <cfe-dev at lists.llvm.org> wrote:
>>>
>>> Hi all,
>>>
>>> I have an initial cut at patch that issues a warning when a file is
>>> #included on a case-insensitive file system that would not be found on a
>>> case-sensitive system. Is there interest?
>>>
>>> Since this is a hard problem to solve perfectly, I have opted for a
>>> strategy that is efficient and conservative about issuing diagnostics. That
>>> is, it shouldn't give false positives, but it will fail to diagnose some
>>> non-portable paths. On *nix systems, the low-level APIs that stat and open
>>> files get an extra call to ::realpath, and the result is cached along with
>>> the rest of the file metadata. On Windows, I use a combination of
>>> GetFullPathName and GetLongPathName to get the same effect. (I don't believe
>>> that's guaranteed to get the physical name including case, but it seems to
>>> mostly work in my testing.)
>>>
>>> Due to how I compare path components, a relative path like
>>> "NoTtHeRiGhTcAsE/../correctly-cased.h" will not be diagnosed, but
>>> "../NoTtHeRiGhTcAsE/correctly-cased.h" will be. Catching more cases requires
>>> many more round trips to the disk, which I wanted to avoid.
>>>
>>>
>>> Hi Eric,
>>>
>>> This would be a hugely welcomed feature, but have you done any
>>> performance analysis of this?  The preprocessor and the data structures you
>>> are touching are very sensitive.
>>>
>>> You can stress test the preprocessor by using the "clang -cc1 -Eonly”
>>> mode.  If you’re on a mac, I’d recommend timing <Cocoa/Cocoa.h>
>>>
>>> -Chris
>>>
>>>
>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>>
>>
>>
>
>
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>



More information about the cfe-dev mailing list