[PATCH] D140347: TargetLowering: Teach DemandedBits about VSCALE

Tue Dec 20 11:08:19 PST 2022

compnerd added inline comments.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:1136
+      return false;
+    APInt VScaleResultUpperbound(64, *MaxVScale);
+    VScaleResultUpperbound *= Op.getConstantOperandAPInt(0).sextOrTrunc(64);
----------------
craig.topper wrote:
> compnerd wrote:
> > craig.topper wrote:
> > > MacDue wrote:
> > > > compnerd wrote:
> > > > > craig.topper wrote:
> > > > > > I'm not sure that hardcoding 64 here is the right fix. I think what we need to check is that the multiply doesn't overflow.
> > > > > @craig.topper I absolutely agree with you.  I call this out in the commit message, that I'm just assuming that this is wide enough.  This can potentially overflow still.  Is there a good way to ensure that we have the proper width?
> > > > My thought was to do: 
> > > > 
> > > > ```
> > > >     APInt Multiplier = Op.getConstantOperandAPInt(0);
> > > >     unsigned MultiplyBits = Log2_32(*MaxVScale) + 1 + Multiplier.getActiveBits();
> > > >     APInt VScaleResultUpperbound = APInt(MultiplyBits, *MaxVScale) * Multiplier.sextOrTrunc(MultiplyBits);
> > > > ```
> > > > Which should always have enough bits to store the full result.
> > > Here's what I was playing with
> > > 
> > > ```
> > >     APInt VScaleResultUpperbound(BitWidth, *MaxVScale);
> > >     // TODO: Vscale max with no leading zeros requires special handling.
> > >     if (VScaleResultUpperbound.isNegative())
> > >       return false;
> > >     bool Overflow;
> > >     VScaleResultUpperbound =
> > >         VScaleResultUpperbound.smul_ov(Op.getConstantOperandAPInt(0), Overflow);
> > >     if (Overflow)
> > >       return false;
> > >     if (VScaleResultUpperbound.isNegative())
> > >       Known.One.setHighBits(VScaleResultUpperbound.countLeadingOnes());
> > >     else
> > >       Known.Zero.setHighBits(VScaleResultUpperbound.countLeadingZeros());
> > > ```
> > I didn't know about the `smul_ov` function, that does seem better.  The use of `BitWidth` is still a problem.  The `BitWidth` is computed from the width of the `DemandedBits` which is computed from the type of the result of the operation.  In this particular test (luckily identifies the issue!) the result type is `i8`, which results in the `BitWidth` being 8, but the vrange is 2,1024, which exceeds the range of i8, and thus when we do scale * multiplier and overflow, we end up with the wrong result.  I think that we should either go with the wider bit width here.
> You're right. I would need to at least check that the vscale max fits in the bitwidth.
Can we somehow be assured that the multiplication between the vscale max and the scalar will not overflow?  It seems that we need a wider width than the vscale max unless I'm not counting bits correctly.

================
Comment at: llvm/test/CodeGen/RISCV/vscale-demanded-bits.ll:3
+
+; CHECK: vse8.v v8, (a5), v0.t
+; CHECK: vadd.vx v8, v8, a3
----------------
jrtc27 wrote:
> compnerd wrote:
> > jrtc27 wrote:
> > > Use UTC, really
> > What do you mean by UTC?
> UpdateTestChecks, which in this case is update_llc_test_checks.py
Done, but this really does seem unnecessarily stringent testing.  What we are looking to verify here is that the index counter is updated, the rest of the code around it really doesn't matter.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140347/new/

https://reviews.llvm.org/D140347