[PATCH] D140347: TargetLowering: Teach DemandedBits about VSCALE

Tue Dec 20 10:31:52 PST 2022

craig.topper added inline comments.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:1136
+      return false;
+    APInt VScaleResultUpperbound(64, *MaxVScale);
+    VScaleResultUpperbound *= Op.getConstantOperandAPInt(0).sextOrTrunc(64);
----------------
compnerd wrote:
> craig.topper wrote:
> > MacDue wrote:
> > > compnerd wrote:
> > > > craig.topper wrote:
> > > > > I'm not sure that hardcoding 64 here is the right fix. I think what we need to check is that the multiply doesn't overflow.
> > > > @craig.topper I absolutely agree with you.  I call this out in the commit message, that I'm just assuming that this is wide enough.  This can potentially overflow still.  Is there a good way to ensure that we have the proper width?
> > > My thought was to do: 
> > > 
> > > ```
> > >     APInt Multiplier = Op.getConstantOperandAPInt(0);
> > >     unsigned MultiplyBits = Log2_32(*MaxVScale) + 1 + Multiplier.getActiveBits();
> > >     APInt VScaleResultUpperbound = APInt(MultiplyBits, *MaxVScale) * Multiplier.sextOrTrunc(MultiplyBits);
> > > ```
> > > Which should always have enough bits to store the full result.
> > Here's what I was playing with
> > 
> > ```
> >     APInt VScaleResultUpperbound(BitWidth, *MaxVScale);
> >     // TODO: Vscale max with no leading zeros requires special handling.
> >     if (VScaleResultUpperbound.isNegative())
> >       return false;
> >     bool Overflow;
> >     VScaleResultUpperbound =
> >         VScaleResultUpperbound.smul_ov(Op.getConstantOperandAPInt(0), Overflow);
> >     if (Overflow)
> >       return false;
> >     if (VScaleResultUpperbound.isNegative())
> >       Known.One.setHighBits(VScaleResultUpperbound.countLeadingOnes());
> >     else
> >       Known.Zero.setHighBits(VScaleResultUpperbound.countLeadingZeros());
> > ```
> I didn't know about the `smul_ov` function, that does seem better.  The use of `BitWidth` is still a problem.  The `BitWidth` is computed from the width of the `DemandedBits` which is computed from the type of the result of the operation.  In this particular test (luckily identifies the issue!) the result type is `i8`, which results in the `BitWidth` being 8, but the vrange is 2,1024, which exceeds the range of i8, and thus when we do scale * multiplier and overflow, we end up with the wrong result.  I think that we should either go with the wider bit width here.
You're right. I would need to at least check that the vscale max fits in the bitwidth.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140347/new/

https://reviews.llvm.org/D140347