[llvm-bugs] [Bug 32564] New: AVX2 - aggressive broadcast generation instead of memory operands
via llvm-bugs
llvm-bugs at lists.llvm.org
Fri Apr 7 04:16:51 PDT 2017
https://bugs.llvm.org/show_bug.cgi?id=32564
Bug ID: 32564
Summary: AVX2 - aggressive broadcast generation instead of
memory operands
Product: clang
Version: trunk
Hardware: PC
OS: Windows NT
Status: NEW
Severity: normal
Priority: P
Component: LLVM Codegen
Assignee: unassignedclangbugs at nondot.org
Reporter: regis.portalez at gmail.com
CC: llvm-bugs at lists.llvm.org
Created attachment 18247
--> https://bugs.llvm.org/attachment.cgi?id=18247&action=edit
open zip to see code reproducer (clang vs gcc and icc)
Following resolution of bug #20054
(https://bugs.llvm.org/show_bug.cgi?id=20054).
llvm codegen (x86 - avx2) now always generates broadcast instructions for splat
values, instead of using memory operands.
See this reproducer :
#include <immintrin.h>
__m256d mulconst(__m256d x) {
const __m256d a = { 15.0, 15.0, 15.0, 15.0 };
return _mm256_mul_pd(x, a);
}
generates [ -O3 -g -S -mavx2 -mavx -mfma ]
.LCPI0_0:
.quad 4624633867356078080 # double 15
mulconst(double __vector(4)): # @mulconst(double
__vector(4))
vbroadcastsd ymm1, qword ptr [rip + .LCPI0_0]
vmulpd ymm0, ymm0, ymm1
ret
This is legitimate when optimizing for code size, but not for speed.
Indeed:
vbroadcastsd is a supplemental instruction,
the result consumes an extra register (which can further generate spilling)
this prevents any use of memory operands, even with inline assembly.
See attached larger reproducer to spot unnecessary spills (and compared
assemble between gcc 6.2 and clang 4.0.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20170407/a3cea5fc/attachment-0001.html>
More information about the llvm-bugs
mailing list