[PATCH] D138874: [InstCombine] canonicalize trunc + insert as bitcast + shuffle, part 3

Thu Dec 1 09:07:36 PST 2022

dmgreen added a comment.

> I tried pushing a couple of tests through AArch64 codegen, and see diffs like this:
>
>   lsr	x8, x0, #48
>   mov	v0.h[3], w8
>   ->
>   fmov	d1, x0
>   mov	v0.h[3], v1.h[3]
>
> Does that seem neutral? If not, we could try harder to fold back to an insertelt in codegen or convert to a target-dependent transform in VectorCombine instead of a generic fold here.

That would come down to the difference between shift (cheap) and lane mov (should be cheapish too). I don't think there's a lot in it.

https://godbolt.org/z/haP87afo9 has some other cases from the tests here. bitcast can be awkward if is secretly includes an extend, which is more difficult than it should be for MVE where most vectors are assumed to be 128bit. We've had problem in the past with instcombine transforming shuffles where it isn't helpful, and I think we still have some. Like I said I don't want to block anything, but this doesn't seem very general, and might be better in the backend or to be cost modelled. (I'm not sure we have sensible costs for bitcasts though. They don't often come up from the vectorizers).

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138874/new/

https://reviews.llvm.org/D138874