[PATCH] D100812: [DAGCombiner] Allow operand of step_vector to be negative.

Wed Apr 21 20:02:18 PDT 2021

junparser added inline comments.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp:1658-1665
+  APInt StepVal = cast<ConstantSDNode>(Step)->getAPIntValue();
   SDValue StartOfHi =
-      DAG.getVScale(dl, EltVT,
-                    cast<ConstantSDNode>(Step)->getAPIntValue() *
-                        LoVT.getVectorMinNumElements());
-  StartOfHi = DAG.getZExtOrTrunc(StartOfHi, dl, HiVT.getVectorElementType());
+      DAG.getVScale(dl, EltVT, StepVal * LoVT.getVectorMinNumElements());
+  StartOfHi =
+      StepVal.isNonNegative()
+          ? DAG.getZExtOrTrunc(StartOfHi, dl, HiVT.getVectorElementType())
+          : DAG.getSExtOrTrunc(StartOfHi, dl, HiVT.getVectorElementType());
----------------
frasercrmck wrote:
> paulwalker-arm wrote:
> > junparser wrote:
> > > paulwalker-arm wrote:
> > > > junparser wrote:
> > > > > junparser wrote:
> > > > > > frasercrmck wrote:
> > > > > > > paulwalker-arm wrote:
> > > > > > > > This doesn't look correct to me.  The operand is either signed or unsigned and should be treated accordingly.  Given this patch wants to allow negative value then we're essentially converting the STEP_VECTOR to take a signed operand and should always use SExt.
> > > > > > > > 
> > > > > > > > That said I'm wondering if the issue here is using DAG to do the extension even though the operand is defined to be a constant.  Perhaps the following works for you?
> > > > > > > > ```
> > > > > > > > SDValue StartOfHi =
> > > > > > > >       DAG.getVScale(HiVT.getVectorElementType(), StepVal * LoVT.getVectorMinNumElements());
> > > > > > > > ```
> > > > > > > I agree, I think it should be consistently SExt. I think @junparser tried to mitigate a RISC-V regression but in line with what you're saying, I bet some of the poor RISC-V code comes from the DAG extending VSCALE and not being able to optimize it due to not being able to infer the known bits.
> > > > > > @paulwalker-arm,  The tradeoff here does try to mitigate regression in riscv-32 with element type like i64.  Meanwhile, extension is necessary for vscale(i64) under  riscv-32.  I checked PromoteIntRes_VSCALE which does not do truncation.  @frasercrmck, would it be ok to relax vscale as same as step_vector?
> > > > > >  
> > > > > @paulwalker-arm, The tradeoff here does try to mitigate regression in riscv-32 with element type like i64. Meanwhile, extension is necessary for vscale(i64) under riscv-32. Since there is no ExpandIntegerResult with vscale. @frasercrmck, would it be ok to relax vscale as same as step_vector?
> > > > I believe `VSCALE` and `STEP_VECTOR` have different problems.  `STEP_VECTOR` has a vector result but scalar operand so it seemed reasonable to allow the flexibility with regards to its operand's type.  I don't believe the same flexibility should be given to `VSCALE` because that is a scalar only operation.
> > > > 
> > > > That's to say that if a target doesn't support i64 for the operand type, then it will also not support i64 as its result. I mean if we decided `STEP_VECTOR`'s operand was not worth the effort and removed it then this code will still need to plant an explicit `MUL` that'll have the same problem.
> > > > 
> > > > So whilst I agree there's a problem I'm not sure this is the function to solve it. Perhaps SPLAT_VECTOR_PARTS need to be involved somewhere?
> > > > 
> > > > Perhaps what we're really asking for here is that `SPLAT_VECTOR`'s operand be relaxed to allow an operand that is smaller than its result element type?
> > > yes, relax vscale does not work here. I have did some local test in ExpandIntegerResult: 
> > > 
> > > ```
> > >  case ISD::VSCALE:
> > >     {
> > >   EVT VT = N->getValueType(0);
> > >   EVT NVT = TLI.getTypeToTransformTo(*DAG.getContext(), VT);
> > >  SDLoc dl(N);
> > >   SDValue NewNode = DAG.getNode(ISD::MUL, dl, VT, DAG.getZExtOrTrunc(DAG.getVScale(dl, 
> > >   NVT, APInt(NVT.getSizeInBits(), 1)), dl, VT), N->getOperand(0));
> > >   ReplaceValueWith(SDValue(N, 0), NewNode);
> > >    break;
> > >    }
> > > ```
> > > turns out as @frasercrmck said, we get much worse code here. So I prefer keep SEXT here.
> > > 
> > I'm not happy with this approach.  It doesn't make sense to change the definition of `ISD::STEP_VECTOR` and then have to tip toe around this definition because a target might not be in a position to generate the most optimal code for it.  I see two options:
> > 
> > 1) We change the definition of `ISD::STEP_VECTOR` and work within that definition, or
> > 2) We maintain the current definition.
> > 
> > Personally I prefer option 1 because it's in the spirit of this node's original intent, but option 2 also works until we're in a position to revisit.
> > 
> > That said, I really think the poor code generation problem is more to do with the definition of `ISD::SPLAT_VECTOR` which doesn't work well for target's whose max legal scalar type is smaller than its max legal vector element type.
> I haven't had time to dig in but I was wondering if the issue of poor code generation could be mitigated by some "known bits" analysis for `ISD::VSCALE`. I don't know if there's precedent for target-specific known-bits analysis of a generic node, but I feel like that's what it'd have to be. RISC-V has known limits on `vscale`.
> 
> We do have `ISD::SPLAT_VECTOR_PARTS` to deal with `ISD::SPLAT_VECTOR` for targets whose max scalar type is smaller than the max vector element type. But I feel like the sign extension is still going to be the blocker?
Although it is also ok with me to maintain current definition of step_vector. Since then we may make some promise to keep operand of step_vector still be positive constant after combine in D100088. As for D100816, we also need some extra pattern to handle negative case with sub. I prefer 1 as well. 
@frasercrmck, do you have any suggestion about where to start, I may have time to dig in.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100812/new/

https://reviews.llvm.org/D100812