[llvm-bugs] [Bug 31018] New: [X86] Widening of elements in shuffle lowering can prevent masking operation from being combined into the final instruction.
via llvm-bugs
llvm-bugs at lists.llvm.org
Mon Nov 14 21:20:27 PST 2016
https://llvm.org/bugs/show_bug.cgi?id=31018
Bug ID: 31018
Summary: [X86] Widening of elements in shuffle lowering can
prevent masking operation from being combined into the
final instruction.
Product: libraries
Version: trunk
Hardware: PC
OS: All
Status: NEW
Severity: normal
Priority: P
Component: Backend: X86
Assignee: unassignedbugs at nondot.org
Reporter: craig.topper at gmail.com
CC: llvm-bugs at lists.llvm.org
Classification: Unclassified
One of the first things shuffle lowering does is widen elements if the shuffle
elements are adjacent. This is great for choosing the best shuffle given the
limited availability of shuffles for smaller element sizes. But it loses the
original type information.
Some shuffles can be implemented equally well with different element sizes. And
sometimes the element size and consequently the number of elements is important
for enabling AVX-512 masking operations to be folded into the final instruction
Fox example the following sequence could be better implemented with a PALIGND
which would allow the masking to be folded.
define <16 x i32>
@mask_shuffle_v16i32_02_03_04_05_06_07_08_09_10_11_12_13_14_15_00_01(<16 x i32>
%a, <16 x i32> %passthru, i16 %mask) {
; ALL-LABEL:
mask_shuffle_v16i32_02_03_04_05_06_07_08_09_10_11_12_13_14_15_00_01:
; ALL: # BB#0:
; ALL-NEXT: valignq {{.*#+}} zmm0 = zmm0[1,2,3,4,5,6,7,0]
; ALL-NEXT: kmovw %edi, %k1
; ALL-NEXT: vpblendmd %zmm0, %zmm1, %zmm0 {%k1}
; ALL-NEXT: retq
%shuffle = shufflevector <16 x i32> %a, <16 x i32> undef, <16 x i32><i32 2,
i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32
13, i32 14, i32 15, i32 0, i32 1>
%mask.cast = bitcast i16 %mask to <16 x i1>
%res = select <16 x i1> %mask.cast, <16 x i32> %shuffle, <16 x i32> %passthru
ret <16 x i32> %res
}
We should add a DAGCombine after shuffle lowering that can detect a select
being fed by a bitcast from a shuffle that can be implemented differently to
remove the bitcast.
Example shuffles that this could apply to:
VPALIGNQ->VPALIGND
128-bit PALIGNR->VPALIGND/VPALIGNQ
SHUFF64x2->SHUFF32x4
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20161115/6f4960d9/attachment.html>
More information about the llvm-bugs
mailing list