[llvm-bugs] [Bug 37819] New: Poor vectorization with -march=skylake compared to -march=haswell

via llvm-bugs llvm-bugs at lists.llvm.org
Fri Jun 15 22:20:21 PDT 2018


https://bugs.llvm.org/show_bug.cgi?id=37819

            Bug ID: 37819
           Summary: Poor vectorization with -march=skylake compared to
                    -march=haswell
           Product: clang
           Version: 6.0
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P
         Component: -New Bugs
          Assignee: unassignedclangbugs at nondot.org
          Reporter: jed at 59a2.org
                CC: llvm-bugs at lists.llvm.org

Created attachment 20432
  --> https://bugs.llvm.org/attachment.cgi?id=20432&action=edit
Source exhibiting optimizer oddity.

The attached code optimizes well for Haswell and runs nearly optimally on both
Haswell and Skylake.

$ clang -Wall -O3 -march=haswell -ffast-math -c mm-clang.c

00000000000000e0 <mult+0xe0> vmovapd ymm9,ymm6
00000000000000e4 <mult+0xe4> vbroadcastsd ymm10,QWORD PTR [rdi+rbx*8-0x800]
00000000000000ee <mult+0xee> vmovupd ymm6,YMMWORD PTR [rax-0x20]
00000000000000f3 <mult+0xf3> vmovupd ymm11,YMMWORD PTR [rax]
00000000000000f7 <mult+0xf7> vfmadd231pd ymm1,ymm6,ymm10
00000000000000fc <mult+0xfc> vfmadd231pd ymm7,ymm11,ymm10
0000000000000101 <mult+0x101> vbroadcastsd ymm10,QWORD PTR [rdi+rbx*8-0x400]
000000000000010b <mult+0x10b> vfmadd231pd ymm8,ymm6,ymm10
0000000000000110 <mult+0x110> vfmadd231pd ymm5,ymm11,ymm10
0000000000000115 <mult+0x115> vbroadcastsd ymm10,QWORD PTR [rdi+rbx*8]
000000000000011b <mult+0x11b> vfmadd231pd ymm2,ymm6,ymm10
0000000000000120 <mult+0x120> vfmadd231pd ymm3,ymm11,ymm10
0000000000000125 <mult+0x125> vbroadcastsd ymm10,QWORD PTR [rdi+rbx*8+0x400]
000000000000012f <mult+0x12f> vfmadd213pd ymm6,ymm10,ymm9
0000000000000134 <mult+0x134> vfmadd231pd ymm4,ymm11,ymm10
0000000000000139 <mult+0x139> add    rax,0x400
000000000000013f <mult+0x13f> add    rbx,0x1
0000000000000143 <mult+0x143> jne    00000000000000e0 <mult+0xe0>

It is much worse when optimized for Skylake.

$ clang -Wall -O3 -march=skylake -ffast-math -c mm-clang.c

0000000000000caf <mult+0xcaf> vmovapd YMMWORD PTR [rsp],ymm2           
0000000000000cb4 <mult+0xcb4> vmovapd ymm2,YMMWORD PTR [rsp+0x20]            
0000000000000cba <mult+0xcba> vmovapd ymm3,YMMWORD PTR [rsp+0x400]           
0000000000000cc3 <mult+0xcc3> vfmadd231pd ymm2,ymm3,ymm0                     
0000000000000cc8 <mult+0xcc8> vmovapd YMMWORD PTR [rsp+0x20],ymm2        
0000000000000cce <mult+0xcce> vmovapd ymm2,YMMWORD PTR [rsp+0x40]             
0000000000000cd4 <mult+0xcd4> vfmadd231pd ymm2,ymm7,ymm0                      
0000000000000cd9 <mult+0xcd9> vmovapd YMMWORD PTR [rsp+0x40],ymm2      
0000000000000cdf <mult+0xcdf> vmovapd ymm2,YMMWORD PTR [rsp+0x60]            
0000000000000ce5 <mult+0xce5> vfmadd231pd ymm2,ymm5,ymm0                    
0000000000000cea <mult+0xcea> vmovapd YMMWORD PTR [rsp+0x60],ymm2      
0000000000000cf0 <mult+0xcf0> vmovapd ymm2,YMMWORD PTR [rsp+0x80]             
0000000000000cf9 <mult+0xcf9> vfmadd231pd ymm2,ymm4,ymm0                     
0000000000000cfe <mult+0xcfe> vmovapd YMMWORD PTR [rsp+0x80],ymm2         
0000000000000d07 <mult+0xd07> vmovapd ymm2,YMMWORD PTR [rsp+0xa0]            
0000000000000d10 <mult+0xd10> vfmadd231pd ymm2,ymm15,ymm0                    


If we drop -ffast-math, FMA instructions are no longer used (for either
-march=haswell or -march=skylake).

0000000000000107 <mult+0x107> vbroadcastsd ymm9,QWORD PTR [rdi+rbx*8-0x400]
0000000000000111 <mult+0x111> vmulpd ymm12,ymm9,ymm10
0000000000000116 <mult+0x116> vaddpd ymm8,ymm8,ymm12
000000000000011b <mult+0x11b> vmulpd ymm9,ymm9,ymm11
0000000000000120 <mult+0x120> vaddpd ymm6,ymm6,ymm9

I don't think -ffast-math should be needed to use FMA instructions here. It
certainly isn't needed for this code with GCC or Intel compilers.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20180616/6b77056e/attachment.html>


More information about the llvm-bugs mailing list