[llvm-bugs] [Bug 31880] New: shuffle and vectorize repeated scalar ops on extracted elements

via llvm-bugs llvm-bugs at lists.llvm.org
Mon Feb 6 08:53:13 PST 2017


https://llvm.org/bugs/show_bug.cgi?id=31880

            Bug ID: 31880
           Summary: shuffle and vectorize repeated scalar ops on extracted
                    elements
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: Transformation Utilities
          Assignee: unassignedbugs at nondot.org
          Reporter: spatel+llvm at rotateright.com
                CC: llvm-bugs at lists.llvm.org
    Classification: Unclassified

Forking this off from the (possibly more specific and less useful) bug 31879
and the motivating (complex numbers) bug 31866:

; For vectors <x0, x1> and <y0, y1>, return <x0*x0, y1*y1>

define <2 x i8> @g(<2 x i8> %x, <2 x i8> %y) {
  %x0 = extractelement <2 x i8> %x, i32 0
  %y1 = extractelement <2 x i8> %y, i32 1
  %x0x0 = mul i8 %x0, %x0
  %y1y1 = mul i8 %y1, %y1
  %ins1 = insertelement <2 x i8> undef, i8 %x0x0, i32 0
  %ins2 = insertelement <2 x i8> %ins1, i8 %y1y1, i32 1
  ret <2 x i8> %ins2
}

The canonical IR should be:

define <2 x i8> @h(<2 x i8> %x, <2 x i8> %y) {
  %x0y1 = shufflevector <2 x i8> %x, <2 x i8> %y, <2 x i32> <i32 0, i32 3>
  %x0x0y1y1 = mul <2 x i8> %x0y1, %x0y1
  ret <2 x i8> %x0x0y1y1
}

This is obviously less instructions and does no extra computational work. Ie,
there are still just two 8-bit multiplies, and we're not operating on unknown
vector elements. Therefore, this is safe even for FP vectors that might contain
perf bombs like denorms because we're still not going to touch them.

The backend will scalarize the vector op if it is not supported to effectively
undo this transform. That should also eliminate the shuffle. 

Note that we don't create arbitrary shuffles in IR because it could be
disastrous for the backend, but this is a special case: it's a "blend" (x86
terminology). Ie, we're taking elements from the 2 source vectors without
crossing vector lanes. There's precedent for this type of shuffle because we
canonicalize vector selects to this form:

define <2 x i8> @h(<2 x i8> %x, <2 x i8> %y) {
  %x0y1 = select <2 x i1> <i1 true, i1 false>, <2 x i8> %x, <2 x i8> %y
  %mul = mul <2 x i8> %x0y1, %x0y1
  ret <2 x i8> %mul
}

$ ./opt  -instcombine scalarizedmath.ll -S

define <2 x i8> @h(<2 x i8> %x, <2 x i8> %y) {
  %x0y1 = shufflevector <2 x i8> %x, <2 x i8> %y, <2 x i32> <i32 0, i32 3>
  %mul = mul <2 x i8> %x0y1, %x0y1
  ret <2 x i8> %mul
}

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20170206/eb765000/attachment.html>


More information about the llvm-bugs mailing list