[PATCH] D100812: [DAGCombiner] Allow operand of step_vector to be negative.

Wed Apr 21 08:48:35 PDT 2021

paulwalker-arm added inline comments.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp:1658-1665
+  APInt StepVal = cast<ConstantSDNode>(Step)->getAPIntValue();
   SDValue StartOfHi =
-      DAG.getVScale(dl, EltVT,
-                    cast<ConstantSDNode>(Step)->getAPIntValue() *
-                        LoVT.getVectorMinNumElements());
-  StartOfHi = DAG.getZExtOrTrunc(StartOfHi, dl, HiVT.getVectorElementType());
+      DAG.getVScale(dl, EltVT, StepVal * LoVT.getVectorMinNumElements());
+  StartOfHi =
+      StepVal.isNonNegative()
+          ? DAG.getZExtOrTrunc(StartOfHi, dl, HiVT.getVectorElementType())
+          : DAG.getSExtOrTrunc(StartOfHi, dl, HiVT.getVectorElementType());
----------------
junparser wrote:
> paulwalker-arm wrote:
> > junparser wrote:
> > > junparser wrote:
> > > > frasercrmck wrote:
> > > > > paulwalker-arm wrote:
> > > > > > This doesn't look correct to me.  The operand is either signed or unsigned and should be treated accordingly.  Given this patch wants to allow negative value then we're essentially converting the STEP_VECTOR to take a signed operand and should always use SExt.
> > > > > > 
> > > > > > That said I'm wondering if the issue here is using DAG to do the extension even though the operand is defined to be a constant.  Perhaps the following works for you?
> > > > > > ```
> > > > > > SDValue StartOfHi =
> > > > > >       DAG.getVScale(HiVT.getVectorElementType(), StepVal * LoVT.getVectorMinNumElements());
> > > > > > ```
> > > > > I agree, I think it should be consistently SExt. I think @junparser tried to mitigate a RISC-V regression but in line with what you're saying, I bet some of the poor RISC-V code comes from the DAG extending VSCALE and not being able to optimize it due to not being able to infer the known bits.
> > > > @paulwalker-arm,  The tradeoff here does try to mitigate regression in riscv-32 with element type like i64.  Meanwhile, extension is necessary for vscale(i64) under  riscv-32.  I checked PromoteIntRes_VSCALE which does not do truncation.  @frasercrmck, would it be ok to relax vscale as same as step_vector?
> > > >  
> > > @paulwalker-arm, The tradeoff here does try to mitigate regression in riscv-32 with element type like i64. Meanwhile, extension is necessary for vscale(i64) under riscv-32. Since there is no ExpandIntegerResult with vscale. @frasercrmck, would it be ok to relax vscale as same as step_vector?
> > I believe `VSCALE` and `STEP_VECTOR` have different problems.  `STEP_VECTOR` has a vector result but scalar operand so it seemed reasonable to allow the flexibility with regards to its operand's type.  I don't believe the same flexibility should be given to `VSCALE` because that is a scalar only operation.
> > 
> > That's to say that if a target doesn't support i64 for the operand type, then it will also not support i64 as its result. I mean if we decided `STEP_VECTOR`'s operand was not worth the effort and removed it then this code will still need to plant an explicit `MUL` that'll have the same problem.
> > 
> > So whilst I agree there's a problem I'm not sure this is the function to solve it. Perhaps SPLAT_VECTOR_PARTS need to be involved somewhere?
> > 
> > Perhaps what we're really asking for here is that `SPLAT_VECTOR`'s operand be relaxed to allow an operand that is smaller than its result element type?
> yes, relax vscale does not work here. I have did some local test in ExpandIntegerResult: 
> 
> ```
>  case ISD::VSCALE:
>     {
>   EVT VT = N->getValueType(0);
>   EVT NVT = TLI.getTypeToTransformTo(*DAG.getContext(), VT);
>  SDLoc dl(N);
>   SDValue NewNode = DAG.getNode(ISD::MUL, dl, VT, DAG.getZExtOrTrunc(DAG.getVScale(dl, 
>   NVT, APInt(NVT.getSizeInBits(), 1)), dl, VT), N->getOperand(0));
>   ReplaceValueWith(SDValue(N, 0), NewNode);
>    break;
>    }
> ```
> turns out as @frasercrmck said, we get much worse code here. So I prefer keep SEXT here.
> 
I'm not happy with this approach.  It doesn't make sense to change the definition of `ISD::STEP_VECTOR` and then have to tip toe around this definition because a target might not be in a position to generate the most optimal code for it.  I see two options:

1) We change the definition of `ISD::STEP_VECTOR` and work within that definition, or
2) We maintain the current definition.

Personally I prefer option 1 because it's in the spirit of this node's original intent, but option 2 also works until we're in a position to revisit.

That said, I really think the poor code generation problem is more to do with the definition of `ISD::SPLAT_VECTOR` which doesn't work well for target's whose max legal scalar type is smaller than its max legal vector element type.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100812/new/

https://reviews.llvm.org/D100812