<div dir="ltr">Hey again,<div><br></div><div>Thank you for your opinions. I will take them into consideration. A few comments...<div class="gmail_extra"><br><div class="gmail_quote">On Sun, Apr 7, 2013 at 1:39 PM, Jeffrey Yasskin <span dir="ltr"><<a href="mailto:jyasskin@google.com" target="_blank">jyasskin@google.com</a>></span> wrote:</div>
<div class="gmail_quote">...<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">If the performance penalty is unclear to you, that means you haven't<br>
measured it. Until you measure, you have absolutely no business<br>
complaining about a potential performance problem. Measure, and then<br>
come back with numbers.</blockquote><div><br></div><div style>Unfortunately, I am restricted from publicly sharing performance results without going through an extensive, expensive legal process. Not fun! </div><div style>
<br></div><div style>Some thoughts though...</div><div style><br></div><div style>In order to test the performance of this Clang feature, I would have to build it into my frontend. That's not cost effective for me for the following reason.</div>
<div><br></div><div style>It seems to me, a priori, that the code currently generated by Clang would indeed have a performance penalty on an inorder processor, without branch prediction. Take Xeon Phi for example. Albeit, a small penalty. Please correct me if my assumptions are incorrect.</div>
<div> </div><div style>Our team's culture dictates that "an instruction is an instruction", hence a performance problem. I understand that "performance problem" will have different definitions among different tribes. </div>
<div style><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class="im">
> Although, I've been contemplating x86-64's behaviour for this case when<br>
> floating point traps are disabled. Ideally, the compiler should preserve<br>
> that behaviour, which might make this software implementation messy.<br>
> Especially if different processors have different implementations. The<br>
> simplest solution... let the hardware behave as it should.<br>
<br>
</div>To be clear, you're asking to turn off a set of optimizations. That<br>
is, you're asking to make code in general run slower, so that you can<br>
get a particular behavior on some CPUs in unusual cases.<br></blockquote><div><br></div><div style>I respectfully disagree. I am asking for an *option* to turn off a set of optimizations; not turn off optimizations in general. I would like to make it easy for a compiler implementor to choose the desired behaviour. I whole-heartedly believe that both behaviours (undefined and trap) have merit.</div>
<div style><br></div><div style>To digress in the interest of light-heartedness, this reminds me of the old joke "my program's performance improved 20x!, but the results aren't correct". :)</div><div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class="im">
>> You might need to<br>
>> do this in the processor-specific backend to avoid other<br>
>> undefined-behavior-based optimizations—that is, recognize "if (x == 0)<br>
>> goto err_handler; else y/x;" and replace it with<br>
>> "register-pc-in-fp-handler-map(); turn-on-fp-traps(); y/x;".<br>
><br>
><br>
> I believe that the constant folder would remove the constant division by<br>
> zero and conditional before the backend could have its say. We would be left<br>
> with only the jump to the error handler. That may complicate things.<br>
<br>
</div>If the compiler can prove x==0, then yes, you'd be left with just a<br>
jump to the error handler. That's more efficient than handling a<br>
hardware trap, so it's what you ought to want. <br></blockquote><div><br></div><div style>I would like a trap. I.e. x86-64's expected behaviour. </div><div style><br></div><div style>I would also not like a branch on non-constant integer divisions. As a reminder, this discussion originated in the constant folder. The non-constant behaviour works just fine.</div>
<div style><br></div><div style>Thanks again,</div><div style>Cameron</div></div><br></div></div></div>