[PATCH] D111221: [AArch64][SVE] Improve code generation for VLS i1 masks
Paul Walker via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Dec 14 10:51:35 PST 2021
paulwalker-arm added a comment.
I've not hit the "Request Changes" button just in case I'm wrong but please don't commit the patch until my `AArch64SVEPredPattern::all` query is resolved one way or the other.
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16534
+ Extract->getValueType(0) != N->getValueType(0) ||
+ Extract.getValueType().getScalarType() != MVT::i1 ||
+ Extract->getConstantOperandVal(1) != 0)
----------------
Is this required? I'm thinking that because the return type of `AArch64ISD::SETCC_MERGE_ZERO` is always a scalable `i1` vector, the previous `Extract->getValueType(0) != N->getValueType(0)` already has you covered here.
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16535
+ Extract.getValueType().getScalarType() != MVT::i1 ||
+ Extract->getConstantOperandVal(1) != 0)
+ return SDValue();
----------------
FYI: I wasn't going to say this and you may well want to ignore this in the interest of time but I cannot convince myself this restriction is necessary. I say this because regardless of it's value the result will be a sequence of `PUNPK` instructions which should all do what we want. The downside is that we'll need more test coverage and given this code is a special case it's probably best to leave as is.
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16542-16544
+ // setcc_merge_zero pred
+ // (sign_extend (punpklo (setcc_merge_zero ... pred ...))), 0, ne
+ // => punpklo (inner setcc_merge_zero)
----------------
This comment doesn't match the code. The `punpklo` part should be `extract_subvector`?
It's also worth copying this as a function comment because without it you need to jump here first before you can understand why the code above exists. Perhaps with a starting statement of "Remove redundant predicate trunc(sext()) sequences."
Actually if you do that then I'm wondering if here words might better describe what is going on. I'm think something like
```
By this point we've effectively got zero_inactive_lanes_and_trunc_i1(sext_i1(A)). If we can prove A's inactive lanes are already zero then the trunc(sext()) sequence is redundant and we can operate on A directly.
```
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16545
+ // => punpklo (inner setcc_merge_zero)
+ SDValue OrigPTrue = InnerSetCC.getOperand(0);
+ if (Pred.getOpcode() == AArch64ISD::PTRUE &&
----------------
Up to you but given you have `InnerSetCC` perhaps this should be `InnerPred`?
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16548
+ OrigPTrue.getOpcode() == AArch64ISD::PTRUE &&
+ Pred.getConstantOperandVal(0) == OrigPTrue.getConstantOperandVal(0))
+ return Extract;
----------------
Is this enough? I'm not totally sure but my initial reaction is "If this is `AArch64SVEPredPattern::all` then that means the number of active lanes for `InnerSetCC` will be different to `N`", so I'm thinking this needs to be restricted to the cases where a specific vector length pattern is used?
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:15737
+static SDValue performSunpkloCombine(SDNode *N, SelectionDAG &DAG) {
+ // sunpklo (mov z, p/z, -1) => mov z, (punpklo p), -1
+ if (N->getOperand(0).getOpcode() == ISD::SIGN_EXTEND &&
----------------
DavidTruby wrote:
> paulwalker-arm wrote:
> > This comment doesn't match the code below. At the DAG level this is really `sunpklo(sext(pred)) -> sext(extract_low_half(pred))`.
> >
> > Also, is this transform universally good? As in, if the later transformation does not occur, is punpklo always better than sunpklo?
> I'm not sure that the transform is universally good, however it does significantly simplify the following transform. This transform has the effect of moving the setcc that converts from a i1 register to another register after the sign_extend, which becomes just a single sign extend rather than a chain of unpacks of unknown length. This is a much easier pattern to recognize later.
Sounds reasonable to me. Given it's not clear cut perhaps it's worth adding a line after `// sunpklo(...` to highlight this combine works in partnership with `performSetCCPunpkCombine`?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D111221/new/
https://reviews.llvm.org/D111221
More information about the llvm-commits
mailing list