[LLVMbugs] [Bug 16218] New: Fast math - turn multiple fdiv's with the same denominator into fmul by the reciprocal

Mon Jun 3 23:03:48 PDT 2013

http://llvm.org/bugs/show_bug.cgi?id=16218

            Bug ID: 16218
           Summary: Fast math - turn multiple fdiv's with the same
                    denominator into fmul by the reciprocal
           Product: new-bugs
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P
         Component: new bugs
          Assignee: unassignedbugs at nondot.org
          Reporter: baldrick at free.fr
                CC: llvmbugs at cs.uiuc.edu, shuxin.llvm at gmail.com
    Classification: Unclassified

The fatigue2 polyhedron benchmark runs twice as fast when compiled with GCC. 
The reason is that in the main loop there is a series of fast-math fdiv's with
the same denominator:

  %d1 = fdiv fast double %x1, %y
...
  %d2 = fdiv fast double %x2, %y
...
  %d3 = fdiv fast double %x3, %y
...
  %d4 = fdiv fast double %x4, %y
...

GCC replaces the fdiv's by fmul's by the reciprocal:

  %r = fdiv fast double 1.000000e+00, %y
  %d1 = fmul fast double %x1, %r
...
  %d2 = fmul fast double %x2, %r
...
  %d3 = fmul fast double %x3, %r
...
  %d4 = fmul fast double %x4, %r
...

This is a big win in this case (replaces 9 fdiv's with 1 reciprocal and 9
fmul's).  We should do this transform too.

One way, suggested by Shuxin, is to always normalize every fast-math fdiv into
multiplication by the reciprocal, eg in instcombine:

  %c = fdiv fast double %a, %b
->
  %r = fdiv fast double 1.000000e+00, %b
  %c = fmul fast double %a, %r

In the above fatigue2 example this would result in:

  %r1 = fdiv fast double 1.000000e+00, %y
  %d1 = fmul fast double %x1, %r1
...
  %r2 = fdiv fast double 1.000000e+00, %y
  %d2 = fmul fast double %x2, %r2
...
  %r3 = fdiv fast double 1.000000e+00, %y
  %d3 = fmul fast double %x3, %r3
...
  %r4 = fdiv fast double 1.000000e+00, %y
  %d4 = fmul fast double %x4, %r4
...

GVN should then eliminate the duplicated reciprocals, giving the desired
result:

  %r1 = fdiv fast double 1.000000e+00, %y
  %d1 = fmul fast double %x1, %r1
...
  %d2 = fmul fast double %x2, %r1
...
  %d3 = fmul fast double %x3, %r1
...
  %d4 = fmul fast double %x4, %r1
...

The problem with this approach is that it's not a win if the reciprocal is only
used once.  This can be fixed up during codegen: if during codegen we see a
reciprocal
  %r = fdiv fast double 1.000000e+00, %b
with only one use, and that use is an fmul
  %c = fmul fast double %a, %r
then this can be transformed into an fdiv
  %c = fdiv fast double %a, %b

It could be done at the DAG combiner level, but this would miss cases where the
reciprocal and the fmul are in different basic blocks (it's not clear whether
this matters much in practice; for example the suggested splitting of fdiv into
a reciprocal and an fmul will always put them in the same basic block, though
in theory other optimizers might move things around).  It could also be done in
codegen but at the IR level, for example in codegen prepare.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20130604/d6af25f6/attachment.html>