[PATCH] D117900: [AArch64][SVE] Fold gather/scatter with 32bits when possible

Tue Jan 25 14:08:47 PST 2022

sdesmalen added a comment.

The approach seems right to me, but the code is still a little messy so I mostly left a bunch of suggestions/nits to improve readability.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16104
+                                     SelectionDAG &DAG) {
+  bool ScaledOffset =
+      (IndexType == ISD::SIGNED_SCALED) || (IndexType == ISD::UNSIGNED_SCALED);
----------------
nit: The word 'Offset' is used quite a few times in this function, which confused me a bit. I think what this is saying is that the eventual instruction will scale the offset (i.e. the offset is not an offset, but an index). So maybe change this to `OffsetIsScaledByInstruction`?

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16107-16118
+  // Only consider element types that are pointer sized as smaller types can
+  // be easily promoted.
+  if (Index.getValueType().getVectorElementType() != MVT::i64)
+    return false;
+
+  EVT IndexVT = Index.getValueType();
+  assert(IndexVT.getVectorElementType() == MVT::i64 &&
----------------
nit: this is equivalent to:
  // Only consider element types that are pointer sized as smaller types can
  // be easily promoted. Also ignore indices that are already type legal.
  EVT IndexVT = Index.getValueType();
  if (IndexVT.getVectorElementType() != MVT::i64 || IndexVT == MVT::nxv2i64)
    return false;

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16120
+
+  IndexVT = IndexVT.changeVectorElementType(MVT::i32);
+  int64_t StepConst = 0;
----------------
nit: IndexVT isn't used until line 16173, so don't change it here but rather closest to where it's used. Also, I'd suggest storing it a new variable, named e.g. `NewIndexVT`.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16121
+  IndexVT = IndexVT.changeVectorElementType(MVT::i32);
+  int64_t StepConst = 0;
+  SDLoc DL(N);
----------------
Maybe rename this to `Stride` ?

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16127
+  if (Index.getOpcode() == ISD::ADD &&
+      Index.getOperand(1).getOpcode() == ISD::STEP_VECTOR) {
+    auto *C = dyn_cast<ConstantSDNode>(Index.getOperand(1).getOperand(0));
----------------
This makes the assumption that STEP_VECTOR is on the right-hand side. There is currently no code that forces this, so you'll want to add a condition to SelectionDAG::getNode() where it reorders the constants to the right, to do something like this:
```
-    if ((IsN1C && !IsN2C) || (IsN1CFP && !IsN2CFP))
+
+    // Favour step_vector on the RHS because it's a kind of
+    // constant.
+    if (((N1.getOpcode() == ISD::STEP_VECTOR || IsN1C) && !IsN2C) ||
+        (IsN1CFP && !IsN2CFP))
```

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16128
+      Index.getOperand(1).getOpcode() == ISD::STEP_VECTOR) {
+    auto *C = dyn_cast<ConstantSDNode>(Index.getOperand(1).getOperand(0));
+    if (!C)
----------------
If `Index.getOperand(1)` is ISD::STEP_VECTOR, then it's operand *must* be a constant. There's no need to test for it, you can call `Index.getOperand(0).getConstantOperandVal(0)` directly.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16134
+      if (ScaledOffset)
+        Offset = DAG.getNode(ISD::MUL, DL, MVT::i64, Offset, N->getScale());
+      BasePtr = DAG.getNode(ISD::ADD, DL, MVT::i64, BasePtr, Offset);
----------------
nit: Does `N->getScale()` always return a value even when ScaledOffset == false?
If so, and it returns a Constant `1` by default, then you can remove the `if (ScaledOffset)` condition above.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16136
+      BasePtr = DAG.getNode(ISD::ADD, DL, MVT::i64, BasePtr, Offset);
+      ChangedIndex = true;
+    }
----------------
If `StepConst` is only set in this block (or in general, when we know for sure that we can fold this into a simpler addressing mode), then ChangedIndex is redundant, and you can just check that StepConst != 0.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16145
+    if (auto Shift = DAG.getSplatValue(Index.getOperand(1)))
+      if (auto Offset = DAG.getSplatValue(Index.getOperand(0).getOperand(0))) {
+        auto *C = dyn_cast<ConstantSDNode>(Shift);
----------------
This isn't very easy to read, I'd recommend creating three variables instead:

  if (Index.getOpcode() == ISD::SHL &&
      Index.getOperand(0).getOpcode() == ISD::ADD &&
      Index.getOperand(0).getOperand(1).getOpcode() == ISD::STEP_VECTOR) {
    SDValue ShiftAmount = Index.getOperand(1);              
    SDValue BaseOffset = Index.getOperand(0).getOperand(0); 
    SDValue Step = Index.getOperand(0).getOperand(1);       
    if (auto Shift = DAG.getSplatValue(ShiftAmount))        
      if (auto *ShiftC = dyn_cast<ConstantSDNode>(Shift))   
        if (auto Offset = DAG.getSplatValue(BaseOffset)) {  
           // ... code to calculate Stride, Offset and BasePtr here ...

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16167-16172
+  MachineFunction &MF = DAG.getMachineFunction();
+  llvm::Optional<unsigned int> MaxVScale = 1;
+  if (MF.getFunction().hasFnAttribute(Attribute::VScaleRange))
+    MaxVScale = MF.getFunction()
+                    .getFnAttribute(Attribute::VScaleRange)
+                    .getVScaleRangeMax();
----------------
You can more easily calculate MaxVScale from:

  const auto &Subtarget =
      static_cast<const AArch64Subtarget &>(DAG.getSubtarget());
  unsigned MaxVScale = Subtarget.getMaxSVEVectorSizeInBits() / AArch64::SVEBitsPerBlock;

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16190
+
+  SDLoc DL(MGS);
+  SDValue Chain = MGS->getChain();
----------------
Maybe just move all these definitions after the `if(!DCI.isBeforeLegalize()) return SDValue();` (that is, if you take my suggestion mentioned below).

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16198
+
+  if (DCI.isBeforeLegalize()) {
+    // Here we catch such cases early and change MGATHER's IndexType to allow
----------------
nit: just bail out early to avoid a level of indentation, e.g.

  if (!DCI.isBeforeLegalize())
    return SDValue();

  if (FindMoreOptimalIndexType(...)) {
    ...
  }

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17678-17681
+  case ISD::MGATHER:
+    return performMaskedGatherScatterCombine(N, DCI, DAG);
+  case ISD::MSCATTER:
+    return performMaskedGatherScatterCombine(N, DCI, DAG);
----------------
nit: 

  case ISD::MGATHER:
  case ISD::MSCATTER:
    return performMaskedGatherScatterCombine(N, DCI, DAG);

?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D117900/new/

https://reviews.llvm.org/D117900