<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Jul 6, 2017, at 3:07 PM, Chris Lattner <<a href="mailto:clattner@nondot.org" class="">clattner@nondot.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div class=""><blockquote type="cite" class=""><div class="">On Jul 6, 2017, at 2:05 PM, Peter Lawrence via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div class=""><blockquote type="cite" class=""><div class="">On Jul 6, 2017, at 1:00 PM, via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><blockquote type="cite" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;" class="">   So far, so good.  The problem is that while LLVM seems to consider<br class="">   the above IR to be valid, we officially do not allow dereferencing<br class="">   a pointer constructed in this way (if I’m reading the rules<br class="">   correctly).  Consequently, if this GEP ever gets close enough to a<br class="">   load using the pointer, InstCombine will eliminate the GEP and the<br class="">   load.</blockquote></div></blockquote></div><br class=""><div class="">This is the part that confuses me, why would such code be eliminated.</div><div class="">If it is illegal then this should be a compilation failure,</div></div></div></blockquote><br class=""></div><div class="">This is illegal code, and if we only cared about the C spec, we could at least warn about it if not reject it outright.</div><div class=""><br class=""></div><div class="">That said, the purpose of clang is to build real code, and real code contains some amount of invalid constructs in important code bases.  We care about building a pragmatic compiler that gets the job done, so we sometimes “make things work” even though we don’t have to.  There are numerous patterns in old-style “offsetof” macros that do similar things.  Instead of fighting to make all the world’s code be theoretically ideal, it is better to just eat it and “do what they meant”.</div><div class=""><br class=""></div></div></div></blockquote><div><br class=""></div><div><br class=""></div><div>Chris,</div><div>          The issue the original poster brought up is that instead of a compiler </div><div>that as you say “makes things work” and “gets the job done” we have a compiler</div><div>that intentionally deletes “undefined behavior”, on the assumption that since it </div><div>is the users responsibility to avoid UB this code must be unreachable and </div><div>is therefore safe to delete.</div><div><br class=""></div><div>It seems like there are three things the compiler could do with undefined behavior</div><div>1)   let the code go through (perhaps with a warning)</div><div>2)   replace the code with a trap</div><div>3)   optimize the code as unreachable (no warning because we’re assuming this is the users intention)</div><div><br class=""></div><div>It looks like 3 is the llvm default, but IMHO is the least desirable choice,</div><div>real world examples showing the benefit are practically non-existent,</div><div>and it can mask a real source code bug.</div><div><br class=""></div><div>In spite of option 3 being (IMHO) the least desirable choice, considerable</div><div>resources are being devoted to implementing it, and it does not seem</div><div>to be being done according to good software engineering practice.</div><div><br class=""></div><div>This optimization seems to fit the “compiler design pattern” of a separate</div><div>analysis and transform pass where “poison” is an attribute that gets forward</div><div>propagated through expressions and assignments until it reaches some</div><div>instruction that turns “poison” into “undefined behavior”, after which the</div><div>block containing the UB can be deleted.</div><div><br class=""></div><div>Putting this analysis and transform into a separate pass means that the</div><div>LangRef and IR can be cleaned up, there is no reason to have “poison”</div><div>and “freeze” in the IR, nor have any other passes have to deal with them.</div><div><br class=""></div><div>Some folks are saying damn the torpedoes full speed ahead on option 3</div><div>in its least software-engineering-friendly form, others are saying wait-a-minute</div><div>lets slow down take a deep breath and consider the big picture first.</div><div><br class=""></div><div>Thoughts ?</div><div>Comments ?</div><div>Questions ?</div><div><br class=""></div><div><br class=""></div><div>Peter Lawrence.</div><div><br class=""></div><div><br class=""></div><div><br class=""></div><br class=""><blockquote type="cite" class=""><div class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">-Chris</div><div class=""><br class=""></div><br class=""></div></div></blockquote></div><br class=""></body></html>