[PATCH] D34844: [AMDGPU] Always use rcp + mul with fast math

Thu Jun 29 16:16:29 PDT 2017

rampitec created this revision.
Herald added subscribers: t-tye, tpr, dstuttard, yaxunl, nhaehnle, wdng, kzhuravl.

Regardless of relaxation options such as -cl-fast-relaxed-math
we are producing rather long code for fdiv via amdgcn_fdiv_fast
intrinsic. This intrinsic is used to replace fdiv with 2.5ulp
metadata and does not handle denormals, thus believed to be fast.

An fdiv instruction can also have fast math flag either by itself
or together with fpmath metadata. Clang used with a relaxation flag
always produces both metadata and fast flag:

  %div = fdiv fast float %v, %0, !fpmath !12
  !12 = !{float 2.500000e+00}

Current implementation ignores fast flag and favors metadata. An
instruction with just fast flag would be lowered to a fastest rcp +
mul, but that never happen on practice because of described mutual
clang and BE behavior.

This change allows an "fdiv fast" to be always lowered as rcp + mul.

https://reviews.llvm.org/D34844

Files:
  lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
  lib/Target/AMDGPU/SIISelLowering.cpp
  test/CodeGen/AMDGPU/amdgpu-codegenprepare-fdiv.ll
  test/CodeGen/AMDGPU/fdiv.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D34844.104770.patch
Type: text/x-patch
Size: 8917 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170629/1a460266/attachment.bin>