[llvm-bugs] [Bug 39709] New: [X86] Suboptimal code in vXi8 vector multiply reduction

Mon Nov 19 10:46:23 PST 2018

https://bugs.llvm.org/show_bug.cgi?id=39709

            Bug ID: 39709
           Summary: [X86] Suboptimal code in vXi8 vector multiply
                    reduction
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Windows NT
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Backend: X86
          Assignee: unassignedbugs at nondot.org
          Reporter: craig.topper at gmail.com
                CC: craig.topper at gmail.com, llvm-bugs at lists.llvm.org,
                    llvm-dev at redking.me.uk, spatel+llvm at rotateright.com

Multiplying vXi8 vectors requires widening elements to 16 bits to use vXi16
pmullw then shrinking back to i8. As of r347240 we use punpacklbw/punpackhbw to
do the expansion create undef upper elements and we use an AND+PACKUS to merge
the high and low unpacked values back together after the two pmullw.

When we're doing a horizontal reduction we end up packing after each step and
then unpacking at the start of the next step. It would be great if we could
combine these size changes away.

Some of the packs and unpacks are separated by shuffles to move elements from
higher elements to lower elements to do the reduction. We should see if we can
handle widening those element movement shuffles as well.

These things can be seen in vector-reduce-mul.ll

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20181119/be5b5e9c/attachment-0001.html>