[llvm-dev] [PATCH] D70246: [InstCombine] remove identity shuffle simplification for mask with undefs

Sanjay Patel via llvm-dev llvm-dev at lists.llvm.org
Mon Dec 9 08:18:26 PST 2019


Hi Jay -

Yes, this change carried some risk of regressions, and we should still be
able to optimize this case. We may need to add or loosen a fold for
splat-of-insert:
define <4 x float> @f(<4 x float> %x) {
  %ins3 = insertelement <4 x float> %x, float 3.000000e+00, i32 3
  %splat0 = shufflevector <4 x float> %ins3, <4 x float> undef, <4 x i32>
zeroinitializer
  %splat3 = shufflevector <4 x float> %ins3, <4 x float> undef, <4 x i32>
<i32 3, i32 3, i32 3, i32 3>
  %r = fadd <4 x float> %splat0, %splat3
  ret <4 x float> %r
}

SimplifyDemandedVectorElts() may not get this because %ins3 has multiple
uses, but I didn't step through that yet to confirm. I can work on this if
you haven't started already. Let me know.

On Mon, Dec 9, 2019 at 10:36 AM Jay Foad <jay.foad at gmail.com> wrote:

> Sanjay,
>
> I'm looking at some missed optimizations caused by D70246. Here's a test
> case:
>
> define <4 x float> @f(i32 %t32, <4 x float>* %t24) {
> .entry:
>   %t43 = insertelement <3 x i32> undef, i32 %t32, i32 2
>   %t44 = bitcast <3 x i32> %t43 to <3 x float>
>   %t45 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32>
> <i32 0, i32 undef, i32 undef, i32 undef>
>   %t46 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32>
> <i32 undef, i32 1, i32 undef, i32 undef>
>   %t47 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32>
> <i32 undef, i32 undef, i32 2, i32 undef>
>   %t48 = insertelement <4 x float> %t45, float 1.000000e+00, i32 3
>   %t49 = shufflevector <4 x float> %t48, <4 x float> %t46, <4 x i32>
> <i32 0, i32 5, i32 undef, i32 3>
>   %t50 = shufflevector <4 x float> %t49, <4 x float> %t47, <4 x i32>
> <i32 0, i32 1, i32 6, i32 3>
>   %t55 = load <4 x float>, <4 x float>* %t24, align 16
>   %t58 = getelementptr <4 x float>, <4 x float>* %t24, i64 1
>   %t59 = load <4 x float>, <4 x float>* %t58, align 16
>   %t62 = getelementptr <4 x float>, <4 x float>* %t24, i64 2
>   %t63 = load <4 x float>, <4 x float>* %t62, align 16
>   %t66 = getelementptr <4 x float>, <4 x float>* %t24, i64 3
>   %t67 = load <4 x float>, <4 x float>* %t66, align 16
>   %t69 = shufflevector <4 x float> %t50, <4 x float> undef, <4 x i32>
> zeroinitializer
>   %t71 = fmul <4 x float> %t55, %t69
>   %t72 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32>
> <i32 1, i32 1, i32 1, i32 1>
>   %t74 = fmul <4 x float> %t59, %t72
>   %t75 = fadd <4 x float> %t71, %t74
>   %t76 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32>
> <i32 2, i32 2, i32 2, i32 2>
>   %t78 = fmul <4 x float> %t63, %t76
>   %t79 = fadd <4 x float> %t75, %t78
>   %t80 = shufflevector <4 x float> %t50, <4 x float> undef, <4 x i32>
> <i32 3, i32 3, i32 3, i32 3>
>   %t82 = fmul <4 x float> %t67, %t80
>   %t83 = fadd <4 x float> %t79, %t82
>   ret <4 x float> %t83
> }
>
> Before D70246, opt -instcombine gives this:
>
> define <4 x float> @f(i32 %t32, <4 x float>* %t24) {
> .entry:
>   %t43 = insertelement <3 x i32> undef, i32 %t32, i32 2
>   %t44 = bitcast <3 x i32> %t43 to <3 x float>
>   %t55 = load <4 x float>, <4 x float>* %t24, align 16
>   %t58 = getelementptr <4 x float>, <4 x float>* %t24, i64 1
>   %t59 = load <4 x float>, <4 x float>* %t58, align 16
>   %t62 = getelementptr <4 x float>, <4 x float>* %t24, i64 2
>   %t63 = load <4 x float>, <4 x float>* %t62, align 16
>   %t66 = getelementptr <4 x float>, <4 x float>* %t24, i64 3
>   %t67 = load <4 x float>, <4 x float>* %t66, align 16
>   %t69 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32>
> zeroinitializer
>   %t71 = fmul <4 x float> %t55, %t69
>   %t72 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32>
> <i32 1, i32 1, i32 1, i32 1>
>   %t74 = fmul <4 x float> %t59, %t72
>   %t75 = fadd <4 x float> %t71, %t74
>   %t76 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32>
> <i32 2, i32 2, i32 2, i32 2>
>   %t78 = fmul <4 x float> %t63, %t76
>   %t79 = fadd <4 x float> %t75, %t78
>   %t83 = fadd <4 x float> %t79, %t67
>   ret <4 x float> %t83
> }
>
> After D70246, opt -instcombine gives this:
>
> define <4 x float> @f(i32 %t32, <4 x float>* %t24) {
> .entry:
>   %t43 = insertelement <3 x i32> undef, i32 %t32, i32 2
>   %t44 = bitcast <3 x i32> %t43 to <3 x float>
>   %t45 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32>
> <i32 0, i32 undef, i32 undef, i32 undef>
>   %t48 = insertelement <4 x float> %t45, float 1.000000e+00, i32 3
>   %t55 = load <4 x float>, <4 x float>* %t24, align 16
>   %t58 = getelementptr <4 x float>, <4 x float>* %t24, i64 1
>   %t59 = load <4 x float>, <4 x float>* %t58, align 16
>   %t62 = getelementptr <4 x float>, <4 x float>* %t24, i64 2
>   %t63 = load <4 x float>, <4 x float>* %t62, align 16
>   %t66 = getelementptr <4 x float>, <4 x float>* %t24, i64 3
>   %t67 = load <4 x float>, <4 x float>* %t66, align 16
>   %t69 = shufflevector <4 x float> %t48, <4 x float> undef, <4 x i32>
> zeroinitializer
>   %t71 = fmul <4 x float> %t55, %t69
>   %t72 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32>
> <i32 1, i32 1, i32 1, i32 1>
>   %t74 = fmul <4 x float> %t59, %t72
>   %t75 = fadd <4 x float> %t71, %t74
>   %t76 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32>
> <i32 2, i32 2, i32 2, i32 2>
>   %t78 = fmul <4 x float> %t63, %t76
>   %t79 = fadd <4 x float> %t75, %t78
>   %t80 = shufflevector <4 x float> %t48, <4 x float> undef, <4 x i32>
> <i32 3, i32 3, i32 3, i32 3>
>   %t82 = fmul <4 x float> %t67, %t80
>   %t83 = fadd <4 x float> %t79, %t82
>   ret <4 x float> %t83
> }
>
> Notice that it has failed to simplify %t80, which extracts element 3
> from %t48, which is obviously just the constant 1.0.
>
> Would you expect the change you committed to have this kind of effect?
> Do you agree that simplifying %t80 is still valid, even with the new
> interpretation of the rules about shufflevector and undef and poison?
>
> Thanks,
> Jay.
>
> On Thu, 14 Nov 2019 at 16:25, Sanjay Patel via Phabricator via
> llvm-commits <llvm-commits at lists.llvm.org> wrote:
> >
> > spatel created this revision.
> > spatel added reviewers: efriedma, regehr, aqjune, nlopes, RKSimon,
> lebedev.ri, liuz.
> > Herald added subscribers: hiraditya, mcrosier.
> > Herald added a project: LLVM.
> >
> > Given a shuffle that includes undef elements in an otherwise identity
> mask like:
> >
> >   define <4 x float> @shuffle(<4 x float> %arg) {
> >     %shuf = shufflevector <4 x float> %arg, <4 x float> undef, <4 x i32>
> <i32 undef, i32 1, i32 2, i32 3>
> >     ret <4 x float> %shuf
> >   }
> >
> > We were simplifying that to the input operand.
> >
> > But as discussed in PR43958:
> > https://bugs.llvm.org/show_bug.cgi?id=43958
> > ...that means that per-vector-element poison that would be stopped by
> the shuffle can now leak to the result.
> >
> > For reference, this transform was added long ago with:
> > rL142671 <https://reviews.llvm.org/rL142671>
> >
> > Also note that we still have (and there are tests for) the same
> transform with no undef elements in the mask (a fully-defined identity
> mask). I don't think there's any controversy about that case - it's a valid
> transform under any interpretation of shufflevector/undef/poison.
> >
> > Looking at a few of the diffs into codegen, I don't see any difference
> in final asm. So depending on your perspective, that's good (no real loss
> of optimization power) or bad (poison exists in the DAG, so we only
> partially fixed the bug).
> >
> >
> > https://reviews.llvm.org/D70246
> >
> > Files:
> >   llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
> >   llvm/test/Transforms/InstCombine/X86/clmulqdq.ll
> >   llvm/test/Transforms/InstCombine/X86/x86-avx2.ll
> >   llvm/test/Transforms/InstCombine/X86/x86-avx512.ll
> >   llvm/test/Transforms/InstCombine/X86/x86-f16c.ll
> >   llvm/test/Transforms/InstCombine/X86/x86-pack.ll
> >   llvm/test/Transforms/InstCombine/X86/x86-pshufb.ll
> >   llvm/test/Transforms/InstCombine/X86/x86-sse.ll
> >   llvm/test/Transforms/InstCombine/X86/x86-sse41.ll
> >   llvm/test/Transforms/InstCombine/X86/x86-sse4a.ll
> >   llvm/test/Transforms/InstCombine/X86/x86-vector-shifts.ll
> >   llvm/test/Transforms/InstCombine/X86/x86-vpermil.ll
> >   llvm/test/Transforms/InstCombine/X86/x86-xop.ll
> >   llvm/test/Transforms/InstCombine/shuffle_select.ll
> >   llvm/test/Transforms/InstCombine/vec_demanded_elts.ll
> >   llvm/test/Transforms/InstCombine/vec_shuffle.ll
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191209/02f18d40/attachment.html>


More information about the llvm-dev mailing list