[llvm-bugs] [Bug 31202] New: PMULLD should be avoided if possible on Silvermont
via llvm-bugs
llvm-bugs at lists.llvm.org
Tue Nov 29 04:33:33 PST 2016
https://llvm.org/bugs/show_bug.cgi?id=31202
Bug ID: 31202
Summary: PMULLD should be avoided if possible on Silvermont
Product: libraries
Version: trunk
Hardware: PC
OS: Windows NT
Status: NEW
Severity: normal
Priority: P
Component: Backend: X86
Assignee: unassignedbugs at nondot.org
Reporter: zvi.rackover at intel.com
CC: llvm-bugs at lists.llvm.org
Classification: Unclassified
For the following case:
define <4 x i32> @foo(<4 x i8> %A) {
%z = zext <4 x i8> %A to <4 x i32>
%m = mul nuw nsw <4 x i32> %z, <i32 18778, i32 18778, i32 18778, i32 18778>
ret <4 x i32> %m
}
The following code is generated for Silvermont:
pand .LCPI1_0, %xmm0
pmulld .LCPI1_1, %xmm0
retl
On Silvermont:
PMULLD has a throughput of 1/11 [instruction/cycles].
PMULHUW/PMULHW/PMULLW have a throughput of 1/2 [instruction/cycles].
Note that the multiplicands fit in 16-bits.
We would achieve a higher throughput with the following sequence:
pshufb
pmullw
pmulhw
punpcklwd
This issue was root caused by Farhana Aleen during analysis on internal
workloads which would regress if interleaving would be enabled for Silvermont
in X86TTI (so commit 284779 did not enable interleaving for some subtargets).
It turns out that with interleaving the vectorized IR prior to codegen is
decent for the chosen vectorization width. The issue reported here is one of
the major reasons for the slow-down (but fixing this issue alone only reduces
the regression).
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20161129/97a5ab27/attachment.html>
More information about the llvm-bugs
mailing list