[PATCH] D20341: [CUDA] Enable fusing FP ops for CUDA by default.
Artem Belevich via cfe-commits
cfe-commits at lists.llvm.org
Tue May 17 16:17:31 PDT 2016
tra added a subscriber: scanon.
tra added a comment.
Things are even more interesting. -ffp-contract=fast is *not* what this change does. :-)
We have two places where we can fuse FP instructions -- in clang and in LLVM back-end.
Clang fuses add+mul into llvm.fmuladd intrinsic if -ffp-contract=on (default) and DefaultFPContract=1 (which is only set for OpenCL for some reason) and back-end then decides whether it's profitable to emit fused operation or not. NVPTX does emit fmad.
Compare this to -ffp-contract=fast which actually *disables* fusing in clang and instead allows LLVM backend to do fusing wherever it sees fit (as opposed to 'fuse intrinsics only'. It may potentially fuse any suitable multiply/add pair, not only those vetted by front-end.
Currently there's no way to enable front-end fusing via command line, unless you compile OpenCL source. With this patch in place for CUDA compilation we can pick either no fusing, controlled fusing by front-end or more aggressive fusing by back-end.
Setting DefaultFPContract=1 for CUDA seems to be the least evil -- it's somewhat controlled in scope and gives us a way to disable fusing completely or make it more aggressive if it's needed.
Perhaps @scanon and @hfinkel can weigh in.
More information about the cfe-commits