[llvm-bugs] [Bug 25108] New: 30% performance deficiency vs GCC because of excessive vectorization on -march=slm

via llvm-bugs llvm-bugs at lists.llvm.org
Thu Oct 8 07:27:20 PDT 2015


https://llvm.org/bugs/show_bug.cgi?id=25108

            Bug ID: 25108
           Summary: 30% performance deficiency vs GCC because of excessive
                    vectorization on -march=slm
           Product: clang
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P
         Component: LLVM Codegen
          Assignee: unassignedclangbugs at nondot.org
          Reporter: egor.kochetov at intel.com
                CC: llvm-bugs at lists.llvm.org
    Classification: Unclassified

Created attachment 15033
  --> https://llvm.org/bugs/attachment.cgi?id=15033&action=edit
c++ reproducer, Makefile, two disassembled functions.

Here is the C++ source that gets vectorized by clang and is not vectorized by
gcc, it is the function 'compute' of the attached cpp file.

template <typename T>
void compute (T* v) {
    for (int i = 0; i < 10000; ++i) {        
        // fw pass
        for (int i = 0; i < 6; ++i) {
            v [i] += v [i + 1];
            v [i] *= v [i + 2];
        }
        //back pass
        for (int i = 7; i > 1; --i) {
            v [i] += v [i - 1];
            v [i] *= v [i - 2];
        }
    }
}
The vectorized version happens to work slower, when compiled both for float and
for double numbers, on a Silvermont Intel machine. The test run timings are as
follows:

clang:
float time: 0.000043 ms
double time: 0.000059 ms

gcc:
float time: 0.000029 ms
double time: 0.000043 ms
The source was compiled with -Ofast -m64 -g -march=slm -mfpmath=sse -fPIE. 32
bit build shows the same result.

This difference reproduces a real one which is measured in seconds in one of
the programs I deal with.
It seems, that vectorization costs in clang are not set up correctly for
-march=slm. When compiling by clang with -fno-slp-vectorize, the resulting
performance matches gcc.

Compilers under consideration are: gcc 6.0 trunk vs clang 3.8 trunk.

The attachment is the archive with the C++ reproducer source, Makefile and
assemblies for float version got from the two compilers.

To run the reproducer using the Makefile attached, call `make clean run; make
clean run COMPILER=gcc` after fixing compiler paths in the Makefile.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20151008/385a4d67/attachment.html>


More information about the llvm-bugs mailing list