<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">On Jul 7, 2015, at 7:18 PM, Sean Silva <<a href="mailto:chisophugis@gmail.com" class="">chisophugis@gmail.com</a>> wrote:<br class=""><div><blockquote type="cite" class=""><br class="Apple-interchange-newline"><div class=""><br class=""><div class="gmail_quote" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;"><div style="word-wrap: break-word;" class=""><div class="">You can easily get a 2x or more speedup due to auto-vectorization if you can assume -fstrict-aliasing. Of course usually you wouldn’t write this code, you’d get this because doLoopThing is a template, and N is passed in as a reference.</div></div></blockquote><div class=""><br class=""></div><div class=""><br class=""></div><div class="">You're absolutely right that it's complicated, but I don't think this is the best example.<br class=""></div><div class=""><br class=""></div><div class="">The 2x speedup optimizations you're talking about can be done even without strict aliasing or signed overflow. You just have to emit runtime checks. It's analogous to emitting the "remainder" loop when doing autovectorization. Imagine if the standard made it undefined for a loop over an array of more than 4 floats to not be suitable for vectorization: sure, that would be nice and make the vectorizer's life easier, but we can mostly get the "big bang" speedup without it, in the sense that not having the standard say it is UB probably results in closer to 2% "slowdown" in the aggregate across all these loops (factoring in icache, etc.) than 2x.</div></div></div></blockquote><br class=""></div><div>As Hal points out, this is only true for the simplest cases. By the same line of argument, the compiler should only “use UB information” when it knows it will get a “major speedup”. This approach would at least introduce the risk of UB only when there is some benefit being had.</div><div><br class=""></div><div>That, unfortunately, is an extremely difficult problem. Except in simple cases, it is very hard in a multi-layered compiler to know what speedup result some (e.g.) alias query returning NoAlis instead of MayAlias will result in a client.</div><div><br class=""></div><div>-Chris</div><div><br class=""></div><br class=""></body></html>