[PATCH] D76983: [InstCombine] Transform extractelement-trunc -> bitcast-extractelement

Wed Apr 1 07:07:42 PDT 2020

spatel added a comment.

In D76983#1954751 <https://reviews.llvm.org/D76983#1954751>, @lebedev.ri wrote:

> In D76983#1954722 <https://reviews.llvm.org/D76983#1954722>, @jeroen.dobbelaere wrote:
>
> > This patch triggers a regression on our side:
> >
> > <...>
> >
> > The tests expects to see:
> >
> >   define dso_local <4 x i16> @truncate_v_v(<4 x i32> %lhs) local_unnamed_addr #0 {
> >   entry:
> >     %0 = trunc <4 x i32> %lhs to <4 x i16>
> >     ret <4 x i16> %0
> >   }
> >   
> >
> > which, in machine instructions, is mapped onto a vector trunc instruction.
> >
> > But now, we see:
> >
> >   define dso_local <4 x i16> @truncate_v_v(<4 x i32> %lhs) local_unnamed_addr #0 {
> >   entry:
> >     %0 = bitcast <4 x i32> %lhs to <8 x i16>
> >     %vecinit9 = shufflevector <8 x i16> %0, <8 x i16> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
> >     ret <4 x i16> %vecinit9
> >   }
> >   
> >
> > which is expanded into a large sequence of code going through the stack.
>
>
> This looks like a simple missed transform to me, not a miscompile

I agree. We hit a phase ordering difference - SLP can reduce the chain of insert/extract to a vector trunc, but it doesn't handle the shuffle-of-bitcast. The open question is where to implement that transform. We're on the edge of instcombine vs. vector-combine if we want to do this in IR. Ie, is there consensus that forming a size-changing vector cast from a shuffle is canonical?
Alternatively, we could defer to the backend, but that could still be viewed as a regression in IR since we have more instructions now.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76983/new/

https://reviews.llvm.org/D76983