[PATCH] D40633: [PCG] Poor shuffle lane tracking (PR35454 )

Sanjay Patel via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Nov 30 12:52:00 PST 2017


spatel added a reviewer: efriedma.
spatel added inline comments.


================
Comment at: lib/Transforms/InstCombine/InstructionCombining.cpp:1460-1462
+  // result, and one for shuffle result. That looks like a loot of bitcast
+  // instructions, but they will be all eliminated during the subsequent
+  // instructions combine phases.
----------------
This is not correct. You can't assume the surrounding instructions when creating an instcombine fold. Your test cases should be minimal and show what happens with those patterns. For this transform, that would be something like this:

```
define <8 x i16> @shuffle_add(<4 x i32> %v1, <4 x i32> %v2) {
  %shuffle1 = shufflevector <4 x i32> %v1, <4 x i32> undef, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
  %shuffle2 = shufflevector <4 x i32> %v2, <4 x i32> undef, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
  %bc1 = bitcast <4 x i32> %shuffle1 to <8 x i16>
  %bc2 = bitcast <4 x i32> %shuffle2 to <8 x i16>
  %add = add <8 x i16> %bc1, %bc2
  ret <8 x i16> %add
}

```

With this patch, we go from 5 instructions to 6:


```
define <8 x i16> @shuffle_add(<4 x i32> %v1, <4 x i32> %v2) {
  %1 = bitcast <4 x i32> %v1 to <8 x i16>
  %2 = bitcast <4 x i32> %v2 to <8 x i16>
  %3 = add <8 x i16> %1, %2
  %4 = bitcast <8 x i16> %3 to <4 x i32>
  %5 = shufflevector <4 x i32> %4, <4 x i32> undef, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
  %6 = bitcast <4 x i32> %5 to <8 x i16>
  ret <8 x i16> %6
}

```
Maybe this makes sense because we traded a shuffle for a couple of bitcasts? 

If we used the narrower element type for the shuffle, then we'd have a reduction in instruction count by eliminating the last 2 bitcasts, but I don't know if that is allowed as a target-independent transform.


https://reviews.llvm.org/D40633





More information about the llvm-commits mailing list