<div dir="ltr">Hey again,<div><br></div><div>Thank you for your opinions. I will take them into consideration. A few comments...<div class="gmail_extra"><br><div class="gmail_quote">On Sun, Apr 7, 2013 at 1:39 PM, Jeffrey Yasskin <span dir="ltr"><<a href="mailto:jyasskin@google.com" target="_blank">jyasskin@google.com</a>></span> wrote:</div>

<div class="gmail_quote">...<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">If the performance penalty is unclear to you, that means you haven't<br>


measured it. Until you measure, you have absolutely no business<br>

complaining about a potential performance problem. Measure, and then<br>

come back with numbers.</blockquote><div><br></div><div style>Unfortunately, I am restricted from publicly sharing performance results without going through an extensive, expensive legal process. Not fun! </div><div style>

<br></div><div style>Some thoughts though...</div><div style><br></div><div style>In order to test the performance of this Clang feature, I would have to build it into my frontend. That's not cost effective for me for the following reason.</div>

<div><br></div><div style>It seems to me, a priori, that the code currently generated by Clang would indeed have a performance penalty on an inorder processor, without branch prediction. Take Xeon Phi for example. Albeit, a small penalty. Please correct me if my assumptions are incorrect.</div>

<div> </div><div style>Our team's culture dictates that "an instruction is an instruction", hence a performance problem. I understand that "performance problem" will have different definitions among different tribes. </div>

<div style><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class="im">

> Although, I've been contemplating x86-64's behaviour for this case when<br>

> floating point traps are disabled. Ideally, the compiler should preserve<br>

> that behaviour, which might make this software implementation messy.<br>

> Especially if different processors have different implementations. The<br>

> simplest solution... let the hardware behave as it should.<br>

<br>

</div>To be clear, you're asking to turn off a set of optimizations. That<br>

is, you're asking to make code in general run slower, so that you can<br>

get a particular behavior on some CPUs in unusual cases.<br></blockquote><div><br></div><div style>I respectfully disagree. I am asking for an *option* to turn off a set of optimizations; not turn off optimizations in general. I would like to make it easy for a compiler implementor to choose the desired behaviour. I whole-heartedly believe that both behaviours (undefined and trap) have merit.</div>

<div style><br></div><div style>To digress in the interest of light-heartedness, this reminds me of the old joke "my program's performance improved 20x!, but the results aren't correct". :)</div><div> </div>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class="im">

>> You might need to<br>

>> do this in the processor-specific backend to avoid other<br>

>> undefined-behavior-based optimizations—that is, recognize "if (x == 0)<br>

>> goto err_handler; else y/x;" and replace it with<br>

>> "register-pc-in-fp-handler-map(); turn-on-fp-traps(); y/x;".<br>

><br>

><br>

> I believe that the constant folder would remove the constant division by<br>

> zero and conditional before the backend could have its say. We would be left<br>

> with only the jump to the error handler. That may complicate things.<br>

<br>

</div>If the compiler can prove x==0, then yes, you'd be left with just a<br>

jump to the error handler. That's more efficient than handling a<br>

hardware trap, so it's what you ought to want. <br></blockquote><div><br></div><div style>I would like a trap. I.e. x86-64's expected behaviour. </div><div style><br></div><div style>I would also not like a branch on non-constant integer divisions. As a reminder, this discussion originated in the constant folder. The non-constant behaviour works just fine.</div>

<div style><br></div><div style>Thanks again,</div><div style>Cameron</div></div><br></div></div></div>