[PATCH] D138874: [InstCombine] canonicalize trunc + insert as bitcast + shuffle, part 3

Wed Dec 7 09:01:58 PST 2022

spatel added a comment.

In D138874#3970326 <https://reviews.llvm.org/D138874#3970326>, @RKSimon wrote:

> Also, which option will make it easier to address the remaining missing GVN handling?

Having it in InstCombine will definitely be easier than VectorCombine with respect to phase ordering/dependencies on other passes. 
In the motivating example, we don't get the folding opportunity until late because it requires inlining to see the pattern. That means we wouldn't do this until near the very end of optimization (and so no subsequent GVN).

I wasn't aware of the SDAG shuffle problems that @dmgreen noted for Thumb/MVE, so I was looking at that a bit closer. Even without this patch, we've already uncovered some awful codegen with the earlier folds like:

  define <8 x i16> @low_index_longer_length_poison_basevec_i64(i64 %x) {
    %t = trunc i64 %x to i16
    %r = insertelement <8 x i16> poison, i16 %t, i64 0
    ret <8 x i16> %r
  }

  $ llc -o - -mtriple=thumbv8.1-m.main -mattr=+mve.fp -float-abi=hard
  	vmov.16	q0[0], r0

  -->

  define <8 x i16> @low_index_longer_length_poison_basevec_i64(i64 %x) {
    %vec.x = bitcast i64 %x to <4 x i16>
    %r = shufflevector <4 x i16> %vec.x, <4 x i16> poison, <8 x i32> <i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
    ret <8 x i16> %r
  }

  	sub	sp, #8
  	strd	r0, r1, [sp]
  	mov	r0, sp
  	vldrh.u32	q0, [r0]
  	vmov	r2, r3, d0
  	vmov	r0, r1, d1
  	vmov.16	q0[0], r2
  	vmov.16	q0[1], r3
  	vmov.16	q0[2], r0
  	vmov.16	q0[3], r1
  	add	sp, #8

The reason for that is what seems like a bug in SelectionDAGBuilder. It creates these nodes for the bitcast + shuffle sequence:

  Creating new node: t5: i64 = build_pair t2, t4
  Creating new node: t6: v4i16 = bitcast t5
  Creating new node: t7: v4i16 = undef
  Creating new node: t8: v8i16 = concat_vectors t6, undef:v4i16

But that's discarding information - the upper 48-bits of the build_pair are zapped to undef by the shuffle in IR, but that's gone with the translation to concat_vectors.
I'll try to fix that.

The good news is that potential regressions like above have been in main for almost a week now, and I haven't seen any bug reports/complaints yet.
So maybe this kind of IR pattern doesn't happen much in real code where it would be noticed.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138874/new/

https://reviews.llvm.org/D138874