[LLVMdev] Aggressive FMA fusion for NVPTX
Olivier H Sallenave
ohsallen at us.ibm.com
Tue Jan 13 14:14:28 PST 2015
Hi,
I propose to override the TLI callback enableAggressiveFMAFusion for the
NVPTX backend and return true instead of false. The reason is the same as
for PPC: fmul, fmadd and fadd nodes cost the same number of cycles (see
http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#arithmetic-instructions
), so we can enable more combining heuristics to produce more FMAs. For
instance, this pattern would be considered:
// fold (fadd (fma x, y, (fmul u, v)), z) -> (fma x, y (fma u, v, z))
cf. commits:
http://llvm.org/viewvc/llvm-project?view=revision&revision=218120
http://llvm.org/viewvc/llvm-project?view=revision&revision=225380
Please tell me what you think.
Olivier
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150113/f6350539/attachment.html>
More information about the llvm-dev
mailing list