[llvm-bugs] [Bug 31018] New: [X86] Widening of elements in shuffle lowering can prevent masking operation from being combined into the final instruction.

Mon Nov 14 21:20:27 PST 2016

https://llvm.org/bugs/show_bug.cgi?id=31018

            Bug ID: 31018
           Summary: [X86] Widening of elements in shuffle lowering can
                    prevent masking operation from being combined into the
                    final instruction.
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: Backend: X86
          Assignee: unassignedbugs at nondot.org
          Reporter: craig.topper at gmail.com
                CC: llvm-bugs at lists.llvm.org
    Classification: Unclassified

One of the first things shuffle lowering does is widen elements if the shuffle
elements are adjacent. This is great for choosing the best shuffle given the
limited availability of shuffles for smaller element sizes. But it loses the
original type information.

Some shuffles can be implemented equally well with different element sizes. And
sometimes the element size and consequently the number of elements is important
for enabling AVX-512 masking operations to be folded into the final instruction

Fox example the following sequence could be better implemented with a PALIGND
which would allow the masking to be folded.

define <16 x i32>
@mask_shuffle_v16i32_02_03_04_05_06_07_08_09_10_11_12_13_14_15_00_01(<16 x i32>
%a, <16 x i32> %passthru, i16 %mask) {
; ALL-LABEL:
mask_shuffle_v16i32_02_03_04_05_06_07_08_09_10_11_12_13_14_15_00_01:
; ALL:       # BB#0:
; ALL-NEXT:    valignq {{.*#+}} zmm0 = zmm0[1,2,3,4,5,6,7,0]
; ALL-NEXT:    kmovw %edi, %k1
; ALL-NEXT:    vpblendmd %zmm0, %zmm1, %zmm0 {%k1}
; ALL-NEXT:    retq
  %shuffle = shufflevector <16 x i32> %a, <16 x i32> undef, <16 x i32><i32 2,
i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32
13, i32 14, i32 15, i32 0, i32 1>
  %mask.cast = bitcast i16 %mask to <16 x i1>
  %res = select <16 x i1> %mask.cast, <16 x i32> %shuffle, <16 x i32> %passthru
  ret <16 x i32> %res
}

We should add a DAGCombine after shuffle lowering that can detect a select
being fed by a bitcast from a shuffle that can be implemented differently to
remove the bitcast.

Example shuffles that this could apply to:
VPALIGNQ->VPALIGND
128-bit PALIGNR->VPALIGND/VPALIGNQ
SHUFF64x2->SHUFF32x4

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20161115/6f4960d9/attachment.html>