<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Jul 21, 2017, at 10:55 PM, Mehdi AMINI <<a href="mailto:joker.eph@gmail.com" class="">joker.eph@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><br class=""><div class="gmail_extra"><br class=""><div class="gmail_quote">2017-07-21 22:44 GMT-07:00 Peter Lawrence <span dir="ltr" class=""><<a href="mailto:peterl95124@sbcglobal.net" target="_blank" class="">peterl95124@sbcglobal.net</a>></span>:<br class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word" class=""><span style="font-size:14px" class="">Mehdi,</span><div class=""><span style="font-size:14px" class=""> Hal’s transformation only kicks in in the *presence* of UB</span></div></div></blockquote><div class=""><br class=""></div><div class="">No, sorry I entirely disagree with this assertion: I believe we optimize program where there is no UB. We delete dead code, code that never runs, so it is code that does not exercise UB.</div></div></div></div></div></blockquote><blockquote type="cite" class=""><div dir="ltr" class=""><div class="gmail_extra"><div class="gmail_quote"><div class=""><br class=""></div></div></div></div></blockquote><div><br class=""></div><div><div style="margin: 0px; line-height: normal; background-color: rgb(255, 255, 255);" class=""><span style="font-variant-ligatures: no-common-ligatures;" class="">Mehdi,</span></div><div style="margin: 0px; line-height: normal; background-color: rgb(255, 255, 255);" class=""><span style="font-variant-ligatures: no-common-ligatures;" class=""> I had to read that sentence several times to figure out what the problem</span></div><div style="margin: 0px; line-height: normal; background-color: rgb(255, 255, 255);" class=""><span style="font-variant-ligatures: no-common-ligatures;" class="">is, which is sloppy terminology on my part</span></div><div style="margin: 0px; line-height: normal; background-color: rgb(255, 255, 255); min-height: 16px;" class=""><span style="font-variant-ligatures: no-common-ligatures;" class=""></span><br class=""></div><div style="margin: 0px; line-height: normal; background-color: rgb(255, 255, 255);" class=""><span style="font-variant-ligatures: no-common-ligatures;" class="">Strictly speaking the C standard uses “undefined behavior” to describe what</span></div><div style="margin: 0px; line-height: normal; background-color: rgb(255, 255, 255);" class=""><span style="font-variant-ligatures: no-common-ligatures;" class="">happens at runtime when an “illegal” construct is executed. I have been using</span></div><div style="margin: 0px; line-height: normal; background-color: rgb(255, 255, 255);" class=""><span style="font-variant-ligatures: no-common-ligatures;" class="">“undefined behavior” and UB to describe the “illegal” construct whether it is</span></div><div style="margin: 0px; line-height: normal; background-color: rgb(255, 255, 255);" class=""><span style="font-variant-ligatures: no-common-ligatures;" class="">executed or not.</span></div><div style="margin: 0px; line-height: normal; background-color: rgb(255, 255, 255); min-height: 16px;" class=""><span style="font-variant-ligatures: no-common-ligatures;" class=""> </span><br class="webkit-block-placeholder"></div><div style="margin: 0px; line-height: normal; background-color: rgb(255, 255, 255);" class=""><span style="font-variant-ligatures: no-common-ligatures;" class="">Hence I say “Hal’s transform is triggered by UB”, when I should be saying</span></div><div style="margin: 0px; line-height: normal; background-color: rgb(255, 255, 255);" class=""><span style="font-variant-ligatures: no-common-ligatures;" class="">“Hal’s transformation is triggered by illegal IR”.</span></div><div style="margin: 0px; line-height: normal; background-color: rgb(255, 255, 255); min-height: 16px;" class=""><span style="font-variant-ligatures: no-common-ligatures;" class=""></span><br class=""></div><div style="margin: 0px; line-height: normal; background-color: rgb(255, 255, 255);" class=""><span style="font-variant-ligatures: no-common-ligatures;" class="">All I can say is I’m not the only one being sloppy, what started this entire </span></div><div style="margin: 0px; line-height: normal; background-color: rgb(255, 255, 255);" class=""><span style="font-variant-ligatures: no-common-ligatures;" class="">conversation is the paper titled “Taming Undefined Behavior in LLVM”, while</span></div><div style="margin: 0px; line-height: normal; background-color: rgb(255, 255, 255);" class=""><span style="font-variant-ligatures: no-common-ligatures;" class="">the correct title would be “Taming Illegal IR in LLVM”. (I think we are all</span></div><div style="margin: 0px; line-height: normal; background-color: rgb(255, 255, 255);" class=""><span style="font-variant-ligatures: no-common-ligatures;" class="">pretty confident that LLVM itself is UB-free, or at least we all hope so :-).</span></div><div style="margin: 0px; line-height: normal; background-color: rgb(255, 255, 255); min-height: 16px;" class=""><span style="font-variant-ligatures: no-common-ligatures;" class=""></span></div></div><div>I believe you are being sloppy when you say "we optimize program </div><div>where there is no UB”, because I believe you mean "we optimize program </div><div>under the assumption that there is no UB”. In other words we recognize</div><div>“Illegal” constructs and then assume they are unreachable, and delete </div><div>them, even when we can’t prove by any other means that they are</div><div>unreachable. We don’t know that there is no (runtime) UB, we just assume it.</div><div><br class=""></div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="gmail_extra"><div class="gmail_quote"><div class="">The example Hal showed does not exhibit UB, it is perfectly valid according to the standard.</div><div class=""><br class=""></div></div></div></div></div></blockquote><div><br class=""></div><div>Whether it exhibits UB at runtime or not is not the issue, the issue is what </div><div>a static analyzer or compiler can tell before runtime, see below</div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="gmail_extra"><div class="gmail_quote"><div class=""> <br class=""></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word" class=""><div class=""><span style="font-size:14px" class="">, and</span></div><div class=""><span style="font-size:14px" class="">it does not matter how that UB got there, whether by function inlining</span></div><div class=""><span style="font-size:14px" class="">or without function inlining.</span></div><div class=""><span style="font-size:14px" class=""><br class=""></span></div><div class=""><span style="font-size:14px" class="">The problem with Hal’s argument is that the compiler does not have</span></div><div class=""><span style="font-size:14px" class="">a built in ouija board with which it can conjure up the spirit of the</span></div><div class=""><span style="font-size:14px" class="">author of the source code and find out if the UB was intentional</span></div><div class=""><span style="font-size:14px" class="">with the expectation of it being deleted, or is simply a bug.</span></div><div class=""><span style="font-size:14px" class="">Function inlining does not magically turn a bug into not-a-bug, nor</span></div><div class=""><span style="font-size:14px" class="">does post-inlining simplification magically turn a bug into not-a-bug.</span></div><div class=""><span style="font-size:14px" class=""><br class=""></span></div><div class=""><span style="font-size:14px" class="">Let me say it again: if the compiler can find this UB (after whatever</span></div><div class=""><span style="font-size:14px" class="">optimizations it takes to get there) then the static analyzer must</span></div><div class=""><span style="font-size:14px" class="">be able to do the same thing, forcing the programmer to fix it</span></div><div class=""><span style="font-size:14px" class="">rather than have the compiler optimize it.</span></div></div></blockquote><div class=""><br class=""></div><div class="">This is again incorrect: there is no UB in the program, there is nothing the static analyzer should report.</div></div></div></div></div></blockquote><div><br class=""></div><div><br class=""></div><div>Hal’s example starts with this template</div><div><br class=""></div><div><div class=""><blockquote cite="mid:56CF6936-E6DC-445C-AAEA-FB51E4FB89D5@sbcglobal.net" type="cite" style="background-color: rgb(255, 255, 255);" class=""><div class=""><div class="">template <typename T></div><div class="">int do_something(T mask, bool cond) {</div><div class=""> if (mask & 2)</div><div class=""> return 42;</div><div class=""><br class=""></div><div class=""> if (cond) {</div><div class=""> T high_mask = mask >> 48; // UB if sizeof(T) < 8, and cond true</div><div class=""> if (high_mask > 5)</div><div class=""> do_something_1(high_mask);</div><div class=""> else</div><div class=""> do_something_2();</div><div class=""> }</div><div class=""><br class=""></div><div class=""> return 0;</div><div class="">}</div></div></blockquote></div><div class=""><div class=""></div></div></div><div><br class=""></div><div>Which is then instantiated with T = char,</div><div>and where it is impossible for either a static analyzer or a </div><div>compiler to figure out and prove that ‘cond’ is always false.</div><div><br class=""></div><div>Hence a static analyzer issues a warning about the shift,</div><div>while llvm gives no warning and instead optimizes the entire</div><div>if-statement away on the assumption that it is unreachable.</div><div><br class=""></div><div>Yes a static analyzer does issue a warning in this case.</div><div><br class=""></div><div><br class=""></div><div>This is not the only optimization to be based on assumption</div><div>rather than fact, for example type-based-alias-analysis is</div><div>based on the assumption that the program is free of this sort</div><div>of aliasing. The difference is that a user can disable TBAA</div><div>and only TBAA if a program seems to be running incorrectly </div><div>when optimized and thereby possibly track down a bug, but</div><div>so far there is no command line option to disable UB-based-</div><div>analysis (or ‘illegal-IR-based” :-), but there really needs to be.</div><div><br class=""></div><div>Do we at least agree on that last paragraph ?</div><div><br class=""></div><div><br class=""></div><div>Peter Lawrence.</div><div><br class=""></div><div><br class=""></div><div><br class=""></div></div><div><br class=""></div><div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="gmail_extra"><div class="gmail_quote"><div class=""><br class=""></div><div class="">The compile is still able to delete some code, because of breaking the abstraction through inlining or template instantiation for example (cf Hal example).</div><div class=""><br class=""></div><div class="">-- </div><div class="">Mehdi</div><div class=""><br class=""></div><div class=""> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word" class=""><div class=""><span style="font-size:14px" class=""><br class=""></span></div><div class=""><span style="font-size:14px" class="">Or, to put it another way: there is no difference between a compiler</span></div><div class=""><span style="font-size:14px" class="">and a static analyzer [*]. So regardless of whether it is the compiler or</span></div><div class=""><span style="font-size:14px" class="">the static analyzer that finds any UB, the only rational thing to do with</span></div><div class=""><span style="font-size:14px" class="">it is report it as a bug.</span></div><div class=""><span style="font-size:14px" class=""><br class=""></span></div><div class=""><span style="font-size:14px" class=""><br class=""></span></div><div class=""><span style="font-size:14px" class="">Peter Lawrence.</span></div><div class=""><span style="font-size:14px" class=""><br class=""></span></div><div class=""><span style="font-size:14px" class=""><br class=""></span></div><div class=""><span style="font-size:14px" class="">[* in fact that’s one of the primary reasons Apple adopted llvm, to use</span></div><div class=""><span style="font-size:14px" class=""> It as a base for static analysis]</span></div><div class=""><div class="h5"><div class=""><span style="font-size:14px" class=""><br class=""></span></div><div class=""><br class=""></div><div class=""><br class=""><div class=""><blockquote type="cite" class=""><div class="">On Jul 21, 2017, at 10:03 PM, Mehdi AMINI <<a href="mailto:joker.eph@gmail.com" target="_blank" class="">joker.eph@gmail.com</a>> wrote:</div><br class="m_794137231564833888Apple-interchange-newline"><div class=""><br class="m_794137231564833888Apple-interchange-newline"><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class=""><div class="gmail_quote" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">2017-07-21 21:27 GMT-07:00 Peter Lawrence<span class="m_794137231564833888Apple-converted-space"> </span><span dir="ltr" class=""><<a href="mailto:peterl95124@sbcglobal.net" target="_blank" class="">peterl95124@<wbr class="">sbcglobal.net</a>></span>:<br class=""><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word" class=""><font face="Menlo" class="">Sean,</font><div class=""><font face="Menlo" class=""> Let me re-phrase a couple words to make it perfectly clear</font></div><div class=""><br class=""><div class=""><span class=""><blockquote type="cite" class=""><div class=""><font face="Menlo" class="">On Jul 21, 2017, at 6:29 PM, Peter Lawrence <<a href="mailto:peterl95124@sbcglobal.net" target="_blank" class="">peterl95124@sbcglobal.net</a>> wrote:</font></div><font face="Menlo" class=""><br class="m_794137231564833888m_2150444843056015504Apple-interchange-newline"></font><div class=""><div style="word-wrap:break-word" class=""><font face="Menlo" class="">Sean,</font><div class=""><font face="Menlo" class=""><br class=""></font></div><div class=""><font face="Menlo" class="">Dan Gohman’s “transform” changes a loop induction variable, but does not change the CFG,</font></div><div class=""><font face="Menlo" class=""><br class=""></font></div><div class=""><font face="Menlo" class="">Hal’s “transform” deletes blocks out of the CFG, fundamentally altering it.</font></div><div class=""><font face="Menlo" class=""><br class=""></font></div><div class=""><span style="font-family:Menlo" class="">These are two totally different transforms.</span></div></div></div></blockquote><blockquote type="cite" class=""><div style="word-wrap:break-word" class=""><div class=""><br class=""></div><div class=""><span style="font-family:Menlo" class=""><br class=""></span></div></div></blockquote><blockquote type="cite" class=""><div class=""><div style="word-wrap:break-word" class=""><div class=""><font face="Menlo" class="">And even the analysis is different,</font></div><div class=""><font face="Menlo" class=""><br class=""></font></div><div class=""><font face="Menlo" class="">The first is based on an *assumption* of non-UB (actually there is no analysis to perform)</font></div></div></div></blockquote></span><font face="Menlo" class=""> the *absence* of UB<br class=""></font><span class=""><blockquote type="cite" class=""><div style="word-wrap:break-word" class=""><div class=""><font face="Menlo" class=""><br class=""></font></div><div class=""><font face="Menlo" class="">the second Is based on a *proof* of existence of UB (here typically some non-trivial analysis is required)</font></div></div></blockquote></span><font face="Menlo" class=""> <span class="m_794137231564833888Apple-converted-space"> </span>the *presence* of UB<br class=""></font><span class=""><br class=""><blockquote type="cite" class=""><div style="word-wrap:break-word" class=""><div class=""><font face="Menlo" class="">These have, practically speaking, nothing in common.</font></div><div class=""><font face="Menlo" class=""><br class=""></font></div></div></blockquote><div class=""><br class=""></div><div class=""><br class=""></div></span><div class=""><font face="Menlo" class="">In particular, the first is an optimization, while the second is a transformation that</font></div><div class=""><font face="Menlo" class="">fails to be an optimization because the opportunity for it happening in real world</font></div><div class=""><font face="Menlo" class="">code that is expected to pass compilation without warnings, static analysis without</font></div><div class=""><font face="Menlo" class="">warnings, and dynamic sanitizers without warnings, is zero.</font></div><div class=""><font face="Menlo" class=""><br class=""></font></div><div class=""><font face="Menlo" class="">Or to put it another way, if llvm manages to find some UB that no analyzer or</font></div><div class=""><font face="Menlo" class="">sanitizer does, and then deletes the UB, then the author of that part of llvm</font></div><div class=""><font face="Menlo" class="">is in the wrong group, and belongs over in the analyzer and/or sanitizer group.</font></div></div></div></div></blockquote><div class=""><br class=""></div><div class="">I don't understand your claim, it does not match at all my understand of what we managed to get on agreement on in the past.</div><div class=""><br class=""></div><div class="">The second transformation (dead code elimination to simplify) is based on the assumption that there is no UB.</div><div class=""><br class=""></div><div class="">I.e. after inlining for example, the extra context of the calling function allows us to deduce the value of some conditional branching in the inline body based on the impossibility of one of the path *in the context of this particular caller*.</div><div class=""><br class=""></div><div class="">This does not mean that the program written by the programmer has any UB inside.</div><div class=""><br class=""></div><div class="">This is exactly the example that Hal gave.</div><div class=""><br class=""></div><div class="">This can't be used to expose any meaningful information to the programmer, because it would be full of false positive. Basically a program could be clean of any static analyzer error, of any UBSAN error, and totally UB-free, and still exhibit tons and tons of such issues.</div><div class=""><br class=""></div><div class="">-- </div><div class="">Mehdi</div></div></div></blockquote></div><br class=""></div></div></div></div></blockquote></div><br class=""></div></div>
</div></blockquote></div><br class=""></body></html>