[LLVMdev] Integer divide by zero

Sun Apr 7 11:50:53 PDT 2013

Hey again,

Thank you for your opinions. I will take them into consideration. A few
comments...

On Sun, Apr 7, 2013 at 1:39 PM, Jeffrey Yasskin <jyasskin at google.com> wrote:
...

> If the performance penalty is unclear to you, that means you haven't
> measured it. Until you measure, you have absolutely no business
> complaining about a potential performance problem. Measure, and then
> come back with numbers.

Unfortunately, I am restricted from publicly sharing performance results
without going through an extensive, expensive legal process. Not fun!

Some thoughts though...

In order to test the performance of this Clang feature, I would have to
build it into my frontend. That's not cost effective for me for the
following reason.

It seems to me, a priori, that the code currently generated by Clang would
indeed have a performance penalty on an inorder processor, without branch
prediction. Take Xeon Phi for example. Albeit, a small penalty. Please
correct me if my assumptions are incorrect.

Our team's culture dictates that "an instruction is an instruction", hence
a performance problem. I understand that "performance problem" will have
different definitions among different tribes.

> Although, I've been contemplating x86-64's behaviour for this case when
> > floating point traps are disabled. Ideally, the compiler should preserve
> > that behaviour, which might make this software implementation messy.
> > Especially if different processors have different implementations. The
> > simplest solution... let the hardware behave as it should.
>
> To be clear, you're asking to turn off a set of optimizations. That
> is, you're asking to make code in general run slower, so that you can
> get a particular behavior on some CPUs in unusual cases.
>

I respectfully disagree. I am asking for an *option* to turn off a set of
optimizations; not turn off optimizations in general. I would like to make
it easy for a compiler implementor to choose the desired behaviour. I
whole-heartedly believe that both behaviours (undefined and trap) have
merit.

To digress in the interest of light-heartedness, this reminds me of the old
joke "my program's performance improved 20x!, but the results aren't
correct". :)

> >> You might need to
> >> do this in the processor-specific backend to avoid other
> >> undefined-behavior-based optimizations—that is, recognize "if (x == 0)
> >> goto err_handler; else y/x;" and replace it with
> >> "register-pc-in-fp-handler-map(); turn-on-fp-traps(); y/x;".
> >
> >
> > I believe that the constant folder would remove the constant division by
> > zero and conditional before the backend could have its say. We would be
> left
> > with only the jump to the error handler. That may complicate things.
>
> If the compiler can prove x==0, then yes, you'd be left with just a
> jump to the error handler. That's more efficient than handling a
> hardware trap, so it's what you ought to want.
>

I would like a trap. I.e. x86-64's expected behaviour.

I would also not like a branch on non-constant integer divisions. As a
reminder, this discussion originated in the constant folder. The
non-constant behaviour works just fine.

Thanks again,
Cameron
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130407/06d19ff4/attachment.html>