[PATCH] D22114: [InstCombine] extend vector select matching for non-splat constants

Tue Jul 12 15:23:30 PDT 2016

spatel added inline comments.

================
Comment at: test/Transforms/InstCombine/logical-select.ll:374
@@ -378,1 +373,3 @@
+; CHECK-NEXT:    [[TMP1:%.*]] = select <4 x i1> <i1 true, i1 false, i1 false, i1 true>, <4 x i32> %a, <4 x i32> %b
+; CHECK-NEXT:    ret <4 x i32> [[TMP1]]
 ;
----------------
eli.friedman wrote:
> Is a select actually the canonical form for this?  It seems like we should prefer a shufflevector.
I saw that x86 SSE or AVX didn't care, eg:
  vpblendw	$60, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1],xmm1[2,3,4,5],xmm0[6,7]

..so I figured one was as good as the other, but this is not true in general. AArch64 and PPC+VSX generate different code:

  define <4 x i32> @foo(<4 x i32> %a, <4 x i32> %b) {
    %sel = select <4 x i1> <i1 true, i1 false, i1 false, i1 true>, <4 x i32> %a, <4 x i32> %b
    ret <4 x i32> %sel
  }

  define <4 x i32> @goo(<4 x i32> %a, <4 x i32> %b) {
    %shuf = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 0, i32 5, i32 6, i32 3>
    ret <4 x i32> %shuf
  }
$ ./llc shufsel.ll -o - -mtriple=aarch64

```
adrp	x8, .LCPI0_0
ldr	q2, [x8, :lo12:.LCPI0_0]
bsl	v2.16b, v0.16b, v1.16b
mov		v0.16b, v2.16b
ret

ext	v1.16b, v0.16b, v1.16b, #12
ext	v0.16b, v1.16b, v0.16b, #4
ext	v1.16b, v1.16b, v1.16b, #8
ext	v0.16b, v0.16b, v1.16b, #12
ret
```
$ ./llc shufsel.ll -o - -mtriple=powerpc64 -mattr=vsx
```
addis 3, 2, .LCPI0_0 at toc@ha
addi 3, 3, .LCPI0_0 at toc@l
lxvw4x 0, 0, 3
xxsel 34, 35, 34, 0
blr

addis 3, 2, .LCPI1_0 at toc@ha
addi 3, 3, .LCPI1_0 at toc@l
lxvw4x 36, 0, 3
vperm 2, 2, 3, 4
blr
```

I don't know AArch at all, but bsl seems better than 4 exts. Is there a better way than either of those? PPC xxsel vs. vperm seem equivalent, but I have no knowledge of any recent PPC HW. 

Does the current codegen affect what we do here in IR?


http://reviews.llvm.org/D22114