[llvm-bugs] [Bug 31492] New: [PPC] slower vsx instructions generated for vmac
via llvm-bugs
llvm-bugs at lists.llvm.org
Wed Dec 28 14:54:15 PST 2016
https://llvm.org/bugs/show_bug.cgi?id=31492
Bug ID: 31492
Summary: [PPC] slower vsx instructions generated for vmac
Product: libraries
Version: trunk
Hardware: PC
OS: Linux
Status: NEW
Severity: normal
Priority: P
Component: Backend: PowerPC
Assignee: unassignedbugs at nondot.org
Reporter: carrot at google.com
CC: llvm-bugs at lists.llvm.org
Classification: Unclassified
Created attachment 17789
--> https://llvm.org/bugs/attachment.cgi?id=17789&action=edit
testcase
The attached test case is simplified from vmac. Compile it with options
-m64 -O2 -mvsx -mcpu=power8
LLVM generates following code for the while loop
.LBB0_2: # %while.body
# =>This Inner Loop Header: Depth=1
lxvd2x 0, 0, 7 // *
lxvd2x 1, 0, 6 // *
xxswapd 34, 0 // *
xxswapd 35, 1 // *
vaddudm 2, 3, 2 // *
xxswapd 10, 34 // *
mfvsrd 9, 34 // *
mfvsrd 10, 10 // *
#APP
mulhdu 11, 10, 9
#NO_APP
lxvd2x 11, 7, 8
lxvd2x 12, 0, 5
mulld 9, 9, 10
addi 7, 7, 64
xxswapd 50, 11
xxswapd 51, 12
vaddudm 2, 19, 18
xxswapd 13, 34
mfvsrd 0, 34
mfvsrd 12, 13
mulld 10, 0, 12
#APP
mulhdu 12, 12, 0
#NO_APP
#APP
addc 3, 9, 10
adde 4, 11, 12
#NO_APP
bdnz .LBB0_2
There are two problems:
1. (kp)[i] is loop invariant, its loading can be hoisted before the loop.
2. llvm generates vsx code marked * for the expression
get64PE((mp) + i) + (kp)[i]
if we use simple integer load and add instructions, it will be shorter and
faster. For large input, it can be 35% faster.
Looks like cost model problem in vectorization?
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20161228/e125be6c/attachment.html>
More information about the llvm-bugs
mailing list