[PATCH] D20341: [CUDA] Enable fusing FP ops for CUDA by default.

Tue May 17 16:17:31 PDT 2016

tra added a subscriber: scanon.
tra added a comment.

Things are even more interesting. -ffp-contract=fast is *not* what this change does. :-)

We have two places where we can fuse FP instructions -- in clang and in LLVM back-end.
Clang fuses add+mul into llvm.fmuladd intrinsic if -ffp-contract=on (default) and DefaultFPContract=1 (which is only set for OpenCL for some reason) and back-end then decides whether it's profitable to emit fused operation or not. NVPTX does emit fmad.

Compare this to -ffp-contract=fast which actually *disables* fusing in clang and instead allows LLVM backend to do fusing wherever it sees fit (as opposed to 'fuse intrinsics only'. It may potentially fuse any suitable multiply/add pair, not only those vetted by front-end.

Currently there's no way to enable front-end fusing via command line, unless you compile OpenCL source. With this patch in place for CUDA compilation we can pick either no fusing, controlled fusing by front-end or more aggressive fusing by back-end.

Setting DefaultFPContract=1 for CUDA seems to be the least evil -- it's somewhat controlled in scope and gives us a way to disable fusing completely or make it more aggressive if it's needed.

Perhaps @scanon and @hfinkel can weigh in.

http://reviews.llvm.org/D20341