[PATCH] D41944: [LLVM][IR][LIT] support of 'no-overflow' flag for sdiv\udiv instructions

Wed Jan 17 10:25:55 PST 2018

magabari added a comment.

In https://reviews.llvm.org/D41944#978770, @nlopes wrote:

> In https://reviews.llvm.org/D41944#978753, @magabari wrote:
>
> > In https://reviews.llvm.org/D41944#978691, @nlopes wrote:
> >
> > > Before accepting this patch, we really need to see benchmark results. I'm not going to change clang to start emitting non-UB divs if the perf is going to be horrible. We need data.
> > >  Otherwise I don't see the need for this poison version of division. Could you elaborate if your plan is to expose this somehow to the application developer?
> > >
> > > I'm sorry if this questions have been properly answered in the past. If so, could you please link them here?
> >
> >
> > In general the proposed feature allows compiler to start speculating div without worrying too much of div-by-zero etc. so for example you can do instruction hoisting or vectorizing predicated sdiv.
> >  We are currently focused on vectorizing predicated div instruction and our implementation shows around 20-30% improvements on several tests of coremark-pro and denbench.
>
>
> I believe that in micro benchmarks that can be vectorized you can get nice speedups. The question is what happens end-to-end to regular applications?  Do I have a slowdown? Code size increase because now all my divisions are guarded?
>  Also, you could also guard those vectorizations around checks to ensure sdiv doesn't trap. This increases code size.

Not all divisions should be guarded, In fact all divisions which comes from C\C++ should be with "nof" which means it can be lowered *without* guards. From now on Clang will emit "nof" attribute for each div which comes from the user. And this matches the C\C++ specification on the case of divide-by-zero.
In case that we want to do some optimization that will do some speculation of div calculation we should remove this attribute (which means that overflow may be introduced by the compiler) and in this case we need to guard the the div calculation just in case that the specific target don't have support for lowering this kind of div (I assume that when you decide to do some optimization you should be sure that it's good for your target and not to end up with increase of code size and guards). In fact guarding div calculation is just the default implementation for targets that don't have a support for div that may overflow. In X86 we choose to simulate that div calculation using FP div which seems to be more efficient in some cases.

https://reviews.llvm.org/D41944