[PATCH] D40633: [PCG] Poor shuffle lane tracking (PR35454 )

Thu Nov 30 13:12:18 PST 2017

spatel added inline comments.

================
Comment at: lib/Transforms/InstCombine/InstructionCombining.cpp:1460-1462
+  // result, and one for shuffle result. That looks like a loot of bitcast
+  // instructions, but they will be all eliminated during the subsequent
+  // instructions combine phases.
----------------
efriedma wrote:
> spatel wrote:
> > This is not correct. You can't assume the surrounding instructions when creating an instcombine fold. Your test cases should be minimal and show what happens with those patterns. For this transform, that would be something like this:
> > 
> > ```
> > define <8 x i16> @shuffle_add(<4 x i32> %v1, <4 x i32> %v2) {
> >   %shuffle1 = shufflevector <4 x i32> %v1, <4 x i32> undef, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
> >   %shuffle2 = shufflevector <4 x i32> %v2, <4 x i32> undef, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
> >   %bc1 = bitcast <4 x i32> %shuffle1 to <8 x i16>
> >   %bc2 = bitcast <4 x i32> %shuffle2 to <8 x i16>
> >   %add = add <8 x i16> %bc1, %bc2
> >   ret <8 x i16> %add
> > }
> > 
> > ```
> > 
> > With this patch, we go from 5 instructions to 6:
> > 
> > 
> > ```
> > define <8 x i16> @shuffle_add(<4 x i32> %v1, <4 x i32> %v2) {
> >   %1 = bitcast <4 x i32> %v1 to <8 x i16>
> >   %2 = bitcast <4 x i32> %v2 to <8 x i16>
> >   %3 = add <8 x i16> %1, %2
> >   %4 = bitcast <8 x i16> %3 to <4 x i32>
> >   %5 = shufflevector <4 x i32> %4, <4 x i32> undef, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
> >   %6 = bitcast <4 x i32> %5 to <8 x i16>
> >   ret <8 x i16> %6
> > }
> > 
> > ```
> > Maybe this makes sense because we traded a shuffle for a couple of bitcasts? 
> > 
> > If we used the narrower element type for the shuffle, then we'd have a reduction in instruction count by eliminating the last 2 bitcasts, but I don't know if that is allowed as a target-independent transform.
> On some targets, vector bitcasts aren't free (IIRC big-endian ARM is like this).
> 
> Changing the type of the shuffle is.... maybe a little sketchy.  I mean, ideally targets should be able to handle either form, but I'm not sure we actually do that reliably.  We don't have good tests for that sort of thing.
Yeah, I was afraid that was the answer :)

So 2 options:
1. Wait to do this in the DAG where we can ask if bitcasts are free.
2. Try to match the larger pattern (shuffle+binop+shuffle) shown in the motivating tests.

https://reviews.llvm.org/D40633