[llvm-bugs] [Bug 39709] New: [X86] Suboptimal code in vXi8 vector multiply reduction
via llvm-bugs
llvm-bugs at lists.llvm.org
Mon Nov 19 10:46:23 PST 2018
https://bugs.llvm.org/show_bug.cgi?id=39709
Bug ID: 39709
Summary: [X86] Suboptimal code in vXi8 vector multiply
reduction
Product: libraries
Version: trunk
Hardware: PC
OS: Windows NT
Status: NEW
Severity: enhancement
Priority: P
Component: Backend: X86
Assignee: unassignedbugs at nondot.org
Reporter: craig.topper at gmail.com
CC: craig.topper at gmail.com, llvm-bugs at lists.llvm.org,
llvm-dev at redking.me.uk, spatel+llvm at rotateright.com
Multiplying vXi8 vectors requires widening elements to 16 bits to use vXi16
pmullw then shrinking back to i8. As of r347240 we use punpacklbw/punpackhbw to
do the expansion create undef upper elements and we use an AND+PACKUS to merge
the high and low unpacked values back together after the two pmullw.
When we're doing a horizontal reduction we end up packing after each step and
then unpacking at the start of the next step. It would be great if we could
combine these size changes away.
Some of the packs and unpacks are separated by shuffles to move elements from
higher elements to lower elements to do the reduction. We should see if we can
handle widening those element movement shuffles as well.
These things can be seen in vector-reduce-mul.ll
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20181119/be5b5e9c/attachment-0001.html>
More information about the llvm-bugs
mailing list