[PATCH] D20341: [CUDA] Enable fusing FP ops for CUDA by default.

Tue May 17 16:30:23 PDT 2016

hfinkel added a comment.

In http://reviews.llvm.org/D20341#432525, @tra wrote:

> In http://reviews.llvm.org/D20341#432494, @hfinkel wrote:
>
> >
>
>
>
>
> > That having been said, is this change the equivalent of -ffp-contract=fast or -ffp-contract=on? I think it is the latter and we want the former (i.e. where we let the backend be as aggressive as possible *after* inlining).
>
>
> It is -ffp-contract=on. As it happens, it appears to produce better code compared to -ffp-contract=fast at least on some benchmarks. Apparently smaller IR (smaller number of intrinsic call instructions vs multiple separate mul+add) makes job easier for straight line strength reduction pass and it's able to remove more redundant calculations in unrolled loops.

That's certainly interesting, and frankly, something I don't immediately understand. Given that, at that level, the IR for -ffo-contract=fast is the same as -ffp-contract=off, this seems to point to some more-general problem that we should likely fix anyway.

I will say that, once templated C++ libraries become involved, the per-statement C rules for fusion often don't apply in enough places to be useful. You really need to perform the fusion after inlining. Obviously, however, for more-directly-programmed expressions, this concern does not apply.

http://reviews.llvm.org/D20341