[PATCH] D22114: [InstCombine] extend vector select matching for non-splat constants
Sanjay Patel via llvm-commits
llvm-commits at lists.llvm.org
Tue Jul 12 15:23:30 PDT 2016
spatel added inline comments.
================
Comment at: test/Transforms/InstCombine/logical-select.ll:374
@@ -378,1 +373,3 @@
+; CHECK-NEXT: [[TMP1:%.*]] = select <4 x i1> <i1 true, i1 false, i1 false, i1 true>, <4 x i32> %a, <4 x i32> %b
+; CHECK-NEXT: ret <4 x i32> [[TMP1]]
;
----------------
eli.friedman wrote:
> Is a select actually the canonical form for this? It seems like we should prefer a shufflevector.
I saw that x86 SSE or AVX didn't care, eg:
vpblendw $60, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1],xmm1[2,3,4,5],xmm0[6,7]
..so I figured one was as good as the other, but this is not true in general. AArch64 and PPC+VSX generate different code:
define <4 x i32> @foo(<4 x i32> %a, <4 x i32> %b) {
%sel = select <4 x i1> <i1 true, i1 false, i1 false, i1 true>, <4 x i32> %a, <4 x i32> %b
ret <4 x i32> %sel
}
define <4 x i32> @goo(<4 x i32> %a, <4 x i32> %b) {
%shuf = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 0, i32 5, i32 6, i32 3>
ret <4 x i32> %shuf
}
$ ./llc shufsel.ll -o - -mtriple=aarch64
```
adrp x8, .LCPI0_0
ldr q2, [x8, :lo12:.LCPI0_0]
bsl v2.16b, v0.16b, v1.16b
mov v0.16b, v2.16b
ret
ext v1.16b, v0.16b, v1.16b, #12
ext v0.16b, v1.16b, v0.16b, #4
ext v1.16b, v1.16b, v1.16b, #8
ext v0.16b, v0.16b, v1.16b, #12
ret
```
$ ./llc shufsel.ll -o - -mtriple=powerpc64 -mattr=vsx
```
addis 3, 2, .LCPI0_0 at toc@ha
addi 3, 3, .LCPI0_0 at toc@l
lxvw4x 0, 0, 3
xxsel 34, 35, 34, 0
blr
addis 3, 2, .LCPI1_0 at toc@ha
addi 3, 3, .LCPI1_0 at toc@l
lxvw4x 36, 0, 3
vperm 2, 2, 3, 4
blr
```
I don't know AArch at all, but bsl seems better than 4 exts. Is there a better way than either of those? PPC xxsel vs. vperm seem equivalent, but I have no knowledge of any recent PPC HW.
Does the current codegen affect what we do here in IR?
http://reviews.llvm.org/D22114
More information about the llvm-commits
mailing list