[llvm-bugs] [Bug 32157] New: combineRepeatedFPDivisors may not generate a reciprocal outside of a loop

via llvm-bugs llvm-bugs at lists.llvm.org
Mon Mar 6 13:23:16 PST 2017


https://bugs.llvm.org/show_bug.cgi?id=32157

            Bug ID: 32157
           Summary: combineRepeatedFPDivisors may not generate a
                    reciprocal outside of a loop
           Product: libraries
           Version: 4.0
          Hardware: Macintosh
                OS: MacOS X
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Backend: X86
          Assignee: unassignedbugs at nondot.org
          Reporter: vit9696 at avp.su
                CC: llvm-bugs at lists.llvm.org

Created attachment 18059
  --> https://bugs.llvm.org/attachment.cgi?id=18059&action=edit
fdiv replacement failure showcase

As discussed in http://lists.llvm.org/pipermail/llvm-dev/2017-March/110736.html
LLVM seems to sometimes fail to hoist the division out of a loop and replace it
with a multiplication by a reciprocal.

I included all the three C samples and their generated IR as an attachment.

First one is a basic example of the issue. There are two fdivs in two separate
basic blocks, and for this reason they seem to be not hoisted outside the loop.
The C code for it is as follows:

__attribute__ ((noinline)) bool calculate(double c) {
        uint32_t b = 0;
        uint32_t a = 0;
        while (b < 0xFFFFFFFF) {
                if ((uint64_t)(b / c) % 2 == 0)
                        a++;
                b++;
        }
        return a;
}

Second one is a basic example of the issue not happening. To do so 0xFFFFFFFF
is replaced by 0xFFFFFFF0 to get only one basic block created.

Third one is an example where there is only one fdiv inside the loop, which
does not get hoisted due to a x86-64 specific threshold equaling to 2. However,
gcc and icc seem to still generate a reciprocal in this case, so perhaps llvm
should take the same approach. 

The issue exists because not generating a reciprocal for these samples makes
the generated code approximately twice slower than the same from gcc/icc on
intel haswell. This is currently reproducible with 3.9.1/4.0.0rc2 given that
the source code is compiled with -O3 -ffast-math flag, which enables the
optimisation.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20170306/9362d970/attachment.html>


More information about the llvm-bugs mailing list