<div dir="ltr"><div>It sounds like you've hit >99% accuracy which should be good
enough
for this feature in my opinion. GetProcAddress will let you know at
runtime if the API is available or not, assuming this method is faster than older equivalents.<br><br></div>I just wanted to force a discussion on possible internationalization issues.</div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Apr 7, 2016 at 4:32 PM, Eric Niebler <span dir="ltr"><<a href="mailto:eniebler@fb.com" target="_blank">eniebler@fb.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word;color:rgb(0,0,0);font-size:14px;font-family:Calibri,sans-serif">
<div>John,</div>
<div><br>
</div>
<div>The strategy you describe is basically the strategy I'm using. I get the <span style="font-weight:bold">
actual</span> name of the file on disk when opening it for the first type and stuff the result in the File cache. Then when processing the #include, I check the components of the user's #include path with the actual path on disk. If any path component differs
from the corresponding component in the "real" path, the warning is triggered -- unless they differ by more than case. In that case, I assume it's a simlink and conservatively don't issue the diagnostic. That last bit is the only place where I use equals_lower.
This would yield a false positives only when (a) two path components are different according to StringRef::equal
<span style="font-weight:bold">and</span> (b) they are the same according to StringRef::equal_lower. That /seems/ unlikely to me, but I'm not positive. It seems more likely that this will fail to diagnose some incorrectly cased paths. Regardless, a locale-sensitive
string compare routine sure would be a nice thing to have here.</div>
<div><br>
</div>
<div>A #include like <NoTtHeRiGhTcAsE/../correctly-cased.h> is not flagged since NoTtHeRiGhTcAsE would probably not show up in the "real" path to correctly-cased.h as reported by the OS. So I have nothing to compare NoTtHeRiGhTcAsE against, and the error is
not diagnosed.</div>
<div><br>
</div>
<div>Re GetFinalPathNameByHandle, that's a new(er) Windows API. Vista and 2008 Server only. I was reluctant to use it since I don't know if LLVM targets older OSes. I'm happy to be wrong about that.</div><span class="HOEnZb"><font color="#888888">
<div><br>
</div>
<div>Eric</div></font></span><div><div class="h5">
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<span>
<div>
<div>On 4/7/16, 4:07 PM, "John Sully" <<a href="mailto:john@csquare.ca" target="_blank">john@csquare.ca</a>> wrote:</div>
</div>
<div><br>
</div>
<blockquote style="BORDER-LEFT:#b5c4df 5 solid;PADDING:0 0 0 5;MARGIN:0 0 0 5">
<div>
<div>
<div dir="ltr">
<div>A safer way is to use an OS API that gives you back the exact name of the file. On windows GetFinalPathNameByHandle should work, I don't know mac well enough to suggest an API there. At that point you could do a case sensitive string compare on the name
and warn if there are differences.<br>
<br>
</div>
This also prevents all the problems of locale specifics since tolower means different things in different locales. So your warning could work fine in en-us but not in some other country. Anything relying on tolower/toupper is guaranteed to have internationalization
bugs.<br>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu, Apr 7, 2016 at 2:54 PM, Eric Niebler <span dir="ltr">
<<a href="mailto:eniebler@fb.com" target="_blank">eniebler@fb.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word;color:rgb(0,0,0);font-size:14px;font-family:Calibri,sans-serif">
<div>I can say that this almost certainly does <span style="font-weight:bold">not</span> work for non-ascii since it just uses StringRef::equals_lower. Is there any proper locale-sensitive string comparison routines in llvm that I can use?</div>
<span><font color="#888888">
<div><br>
</div>
<div>Eric</div>
</font></span>
<div>
<div>
<div><br>
</div>
<div><br>
</div>
<span>
<div>
<div>On 4/7/16, 11:49 AM, "John Sully" <<a href="mailto:john@csquare.ca" target="_blank">john@csquare.ca</a>> wrote:</div>
</div>
<div><br>
</div>
<blockquote style="BORDER-LEFT:#b5c4df 5 solid;PADDING:0 0 0 5;MARGIN:0 0 0 5">
<div>
<div>
<div dir="ltr">Out of curiosity have you tried this with some of the more interesting upper/lower case pairs like the turkish '<code>İ</code>'?
<br>
<br>
It sounds like the way you're achieving this should allow this to work, but its worthwhile to try it.<br>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu, Apr 7, 2016 at 11:37 AM, Chris Lattner via cfe-dev
<span dir="ltr"><<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word"><span><br>
<div>
<blockquote type="cite">
<div>On Apr 5, 2016, at 4:03 PM, Eric Niebler via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>> wrote:</div>
<br>
<div>
<div style="word-wrap:break-word;font-size:14px;font-family:Calibri,sans-serif">
<div>
<div>
<div>Hi all,</div>
<div><br>
</div>
<div>I have an initial cut at patch that issues a warning when a file is #included on a case-insensitive file system that would not be found on a case-sensitive system. Is there interest?</div>
<div><br>
</div>
<div>Since this is a hard problem to solve perfectly, I have opted for a strategy that is efficient and conservative about issuing diagnostics. That is, it shouldn't give false positives, but it will fail to diagnose some non-portable paths. On *nix systems,
the low-level APIs that stat and open files get an extra call to ::realpath, and the result is cached along with the rest of the file metadata. On Windows, I use a combination of GetFullPathName and GetLongPathName to get the same effect. (I don't believe
that's guaranteed to get the physical name including case, but it seems to mostly work in my testing.)</div>
<div><br>
</div>
<div>Due to how I compare path components, a relative path like "NoTtHeRiGhTcAsE/../correctly-cased.h" will not be diagnosed, but "../NoTtHeRiGhTcAsE/correctly-cased.h" will be. Catching more cases requires many more round trips to the disk, which I wanted
to avoid.</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
</div>
</span>
<div>Hi Eric,</div>
<div><br>
</div>
<div>This would be a hugely welcomed feature, but have you done any performance analysis of this? The preprocessor and the data structures you are touching are very sensitive.</div>
<div><br>
</div>
<div>You can stress test the preprocessor by using the "clang -cc1 -Eonly” mode. If you’re on a mac, I’d recommend timing <Cocoa/Cocoa.h></div>
<div><br>
</div>
<div>-Chris</div>
<div><br>
</div>
<br>
</div>
<br>
_______________________________________________<br>
cfe-dev mailing list<br>
<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a><br>
<a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_cfe-2Ddev&d=CwMFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=t4UJXwtMcIwGvgOaMqGzrQ&m=C2mKu_MBUI1HzjevAcSU6IjDRrCvvuRlV6WO_AQdd0M&s=S56yajwDNOEAetztMRmXHZ5r6RT5ibkc9mbhcOEXGU0&e=" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>
<br>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</blockquote>
</span>
<div><br>
</div>
<div>
<div>
<div><br>
</div>
<div>
<div></div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</blockquote>
</span>
<div><br>
</div>
<div>
<div>
<div><br>
</div>
<div>
<div></div>
</div>
</div>
</div>
</div></div></div>
</blockquote></div><br></div>