[PATCH] D111221: [AArch64][SVE] Improve code generation for VLS i1 masks

Paul Walker via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Dec 14 10:51:35 PST 2021


paulwalker-arm added a comment.

I've not hit the "Request Changes" button just in case I'm wrong but please don't commit the patch until my `AArch64SVEPredPattern::all` query is resolved one way or the other.



================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16534
+      Extract->getValueType(0) != N->getValueType(0) ||
+      Extract.getValueType().getScalarType() != MVT::i1 ||
+      Extract->getConstantOperandVal(1) != 0)
----------------
Is this required?  I'm thinking that because the return type of `AArch64ISD::SETCC_MERGE_ZERO` is always a scalable `i1` vector, the previous `Extract->getValueType(0) != N->getValueType(0)` already has you covered here.


================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16535
+      Extract.getValueType().getScalarType() != MVT::i1 ||
+      Extract->getConstantOperandVal(1) != 0)
+    return SDValue();
----------------
FYI: I wasn't going to say this and you may well want to ignore this in the interest of time but I cannot convince myself this restriction is necessary.  I say this because regardless of it's value the result will be a sequence of `PUNPK` instructions which should all do what we want.  The downside is that we'll need more test coverage and given this code is a special case it's probably best to leave as is.


================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16542-16544
+  // setcc_merge_zero pred
+  //   (sign_extend (punpklo (setcc_merge_zero ... pred ...))), 0, ne
+  //   => punpklo (inner setcc_merge_zero)
----------------
This comment doesn't match the code. The `punpklo` part should be `extract_subvector`?

It's also worth copying this as a function comment because without it you need to jump here first before you can understand why the code above exists. Perhaps with a starting statement of "Remove redundant predicate trunc(sext()) sequences."

Actually if you do that then I'm wondering if here words might better describe what is going on.  I'm think something like
```
By this point we've effectively got zero_inactive_lanes_and_trunc_i1(sext_i1(A)). If we can prove A's inactive lanes are already zero then the trunc(sext()) sequence is redundant and we can operate on A directly.
```


================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16545
+  //   => punpklo (inner setcc_merge_zero)
+  SDValue OrigPTrue = InnerSetCC.getOperand(0);
+  if (Pred.getOpcode() == AArch64ISD::PTRUE &&
----------------
Up to you but given you have `InnerSetCC` perhaps this should be `InnerPred`?


================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16548
+      OrigPTrue.getOpcode() == AArch64ISD::PTRUE &&
+      Pred.getConstantOperandVal(0) == OrigPTrue.getConstantOperandVal(0))
+    return Extract;
----------------
Is this enough? I'm not totally sure but my initial reaction is "If this is `AArch64SVEPredPattern::all` then that means the number of active lanes for `InnerSetCC` will be different to `N`", so I'm thinking this needs to be restricted to the cases where a specific vector length pattern is used?


================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:15737
+static SDValue performSunpkloCombine(SDNode *N, SelectionDAG &DAG) {
+  // sunpklo (mov z, p/z, -1) => mov z, (punpklo p), -1
+  if (N->getOperand(0).getOpcode() == ISD::SIGN_EXTEND &&
----------------
DavidTruby wrote:
> paulwalker-arm wrote:
> > This comment doesn't match the code below.  At the DAG level this is really `sunpklo(sext(pred)) -> sext(extract_low_half(pred))`.
> > 
> > Also, is this transform universally good?  As in, if the later transformation does not occur, is punpklo always better than sunpklo?
> I'm not sure that the transform is universally good, however it does significantly simplify the following transform. This transform has the effect of moving the setcc that converts from a i1 register to another register after the sign_extend, which becomes just a single sign extend rather than a chain of unpacks of unknown length. This is a much easier pattern to recognize later.
Sounds reasonable to me.  Given it's not clear cut perhaps it's worth adding a line after `// sunpklo(...` to highlight this combine works in partnership with `performSetCCPunpkCombine`?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111221/new/

https://reviews.llvm.org/D111221



More information about the llvm-commits mailing list