[PATCH] D82602: [SelectionDAG] don't split branch on logic-of-vector-compares

Mon Jun 29 09:10:20 PDT 2020

spatel added a comment.

In D82602#2117418 <https://reviews.llvm.org/D82602#2117418>, @spatel wrote:

> In D82602#2117150 <https://reviews.llvm.org/D82602#2117150>, @uweigand wrote:
>
> > In D82602#2116747 <https://reviews.llvm.org/D82602#2116747>, @spatel wrote:
> >
> > > 2. I missed a SystemZ test change in the earlier draft; adding SystemZ reviewers to confirm if that's an improvement.
> >
> >
> > Interesting.   The compiler now completely optimizes out all operations on one of the two vector lanes, apparently because it recognizes their value is constant and computable at compile time.  So that's an improvement (even though not one this test case was expecting to happen ...).
> >
> > On the other hand, in the operations on the remaining lane, it seems the compiler is now no longer able to perform the known-bits optimization that this test is actually written for -- see the comment that says we should optimize out a redundant AND to get a compare instead of a TM (test-under-mask), but the code now actually does contain a TM (tmll).  That seems a regression at first glance.
>
>
> Thanks. I can't tell if there's some generic transform that would help or if something target-specific is missing.
>  We have this after type-legalization:
>
>         t50: v4i32 = BUILD_VECTOR undef:i32, t32, undef:i32, undef:i32
>       t51: v2i64 = bitcast t50
>       t30: v2i64 = BUILD_VECTOR Constant:i64<1>, Constant:i64<1>
>     t36: v2i64 = xor t51, t30
>   t45: v4i32 = bitcast t36
>   
>
> Would converting the xor to v4i32 help? computeKnownBits probably has problems looking through bitcasts.

It would really obsolete this test, but I think SystemZ should be overriding the TLI hook shouldScalarizeBinop() that defaults to 'false'.

X86 does this:

  bool shouldScalarizeBinop(SDValue VecOp) const {
    unsigned Opc = VecOp.getOpcode();

    // Assume target opcodes can't be scalarized.
    // TODO - do we have any exceptions?
    if (Opc >= ISD::BUILTIN_OP_END)
      return false;

    // If the vector op is not supported, try to convert to scalar.
    EVT VecVT = VecOp.getValueType();
    if (!isOperationLegalOrCustomOrPromote(Opc, VecVT))
      return true;

    // If the vector op is supported, but the scalar op is not, the transform may
    // not be worthwhile.
    EVT ScalarVT = VecVT.getScalarType();
    return isOperationLegalOrCustomOrPromote(Opc, ScalarVT);
  }

If I add that on top of this patch, this test collapses to:

   $ llc -o - sysz.ll -mtriple=s390x-linux-gnu -mcpu=z13  -jump-is-expensive=1
  ...
  # %bb.0:
  	clhhsi	0, 0
  	je	.LBB0_2
  # %bb.1:
  .LBB0_2:
  .Lfunc_end0:

If the override seems reasonable, I can post that for review. Here's a draft with current trunk test changes:

  diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.h b/llvm/lib/Target/SystemZ/SystemZISelLowering.h
  index e60deaedbdf..3e1cdba9906 100644
  --- a/llvm/lib/Target/SystemZ/SystemZISelLowering.h
  +++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.h
  @@ -452,6 +452,25 @@ public:
       return VT == MVT::i32 || VT == MVT::i64;
     }

  +  bool shouldScalarizeBinop(SDValue VecOp) const override {
  +    unsigned Opc = VecOp.getOpcode();
  +
  +    // Assume target opcodes can't be scalarized.
  +    // TODO - do we have any exceptions?
  +    if (Opc >= ISD::BUILTIN_OP_END)
  +      return false;
  +
  +    // If the vector op is not supported, try to convert to scalar.
  +    EVT VecVT = VecOp.getValueType();
  +    if (!isOperationLegalOrCustomOrPromote(Opc, VecVT))
  +      return true;
  +
  +    // If the vector op is supported, but the scalar op is not, the transform may
  +    // not be worthwhile.
  +    EVT ScalarVT = VecVT.getScalarType();
  +    return isOperationLegalOrCustomOrPromote(Opc, ScalarVT);
  +  }
  +
     const char *getTargetNodeName(unsigned Opcode) const override;
     std::pair<unsigned, const TargetRegisterClass *>
     getRegForInlineAsmConstraint(const TargetRegisterInfo *TRI,
  diff --git a/llvm/test/CodeGen/SystemZ/knownbits.ll b/llvm/test/CodeGen/SystemZ/knownbits.ll
  index 08694d8e699..021c939dcfa 100644
  --- a/llvm/test/CodeGen/SystemZ/knownbits.ll
  +++ b/llvm/test/CodeGen/SystemZ/knownbits.ll
  @@ -9,9 +9,9 @@ define i32 @f0(<4 x i32> %a0) {
   ; CHECK:       # %bb.0:
   ; CHECK-NEXT:    vgbm %v0, 0
   ; CHECK-NEXT:    vceqf %v0, %v24, %v0
  -; CHECK-NEXT:    vrepif %v1, 1
  -; CHECK-NEXT:    vnc %v0, %v1, %v0
  +; CHECK-NEXT:    vno %v0, %v0, %v0
   ; CHECK-NEXT:    vlgvf %r2, %v0, 3
  +; CHECK-NEXT:    nilf %r2, 1
   ; CHECK-NEXT:    # kill: def $r2l killed $r2l killed $r2d
   ; CHECK-NEXT:    br %r14
     %cmp0 = icmp ne <4 x i32> %a0, zeroinitializer
  diff --git a/llvm/test/CodeGen/SystemZ/vec-trunc-to-i1.ll b/llvm/test/CodeGen/SystemZ/vec-trunc-to-i1.ll
  index 278f0bf2a30..a6bc5763b25 100644
  --- a/llvm/test/CodeGen/SystemZ/vec-trunc-to-i1.ll
  +++ b/llvm/test/CodeGen/SystemZ/vec-trunc-to-i1.ll
  @@ -7,13 +7,10 @@ define void @pr32275(<4 x i8> %B15) {
   ; CHECK-LABEL: pr32275:
   ; CHECK:       # %bb.0: # %BB
   ; CHECK-NEXT:    vlgvb %r0, %v24, 3
  -; CHECK-NEXT:    vlvgp %v0, %r0, %r0
  -; CHECK-NEXT:    vrepif %v1, 1
  -; CHECK-NEXT:    vn %v0, %v0, %v1
  -; CHECK-NEXT:    vlgvf %r0, %v0, 3
   ; CHECK-NEXT:  .LBB0_1: # %CF34
   ; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
  -; CHECK-NEXT:    cijlh %r0, 0, .LBB0_1
  +; CHECK-NEXT:    tmll %r0, 1
  +; CHECK-NEXT:    jne .LBB0_1
   ; CHECK-NEXT:  # %bb.2: # %CF36
   ; CHECK-NEXT:    br %r14
   BB:

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82602/new/

https://reviews.llvm.org/D82602