[PATCH] D128665: [AArch64] Make nxv1i1 types a legal type for SVE.

Wed Jun 29 16:24:10 PDT 2022

paulwalker-arm added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:4271-4275
+  // Emulate a PTRUE.Q with a PTRUE.D using an PUNPKLO.
+  if (VT == MVT::nxv1i1)
+    return getPUNPKLO(DAG, DL,
+                     DAG.getNode(AArch64ISD::PTRUE, DL, MVT::nxv2i1,
+                                 DAG.getTargetConstant(Pattern, DL, MVT::i32)));
----------------
Strictly speaking this is not correct for all predicate patterns.  Do you need a truly generic implementation of PTRUE.Q or do you only care about specific cases?

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:4622-4630
+  case Intrinsic::aarch64_sve_whilelo:
+    if (Op.getValueType() == MVT::nxv1i1) {
+      SDValue ID =
+          DAG.getTargetConstant(Intrinsic::aarch64_sve_whilelo, dl, MVT::i64);
+      return getPUNPKLO(DAG, dl,
+                        DAG.getNode(ISD::INTRINSIC_WO_CHAIN, dl, MVT::nxv2i1,
+                                    ID, Op.getOperand(1), Op.getOperand(2)));
----------------
Do we really need this or can we fix the cause, which I'm presuming is `LowerSPLAT_VECTOR`. I'd rather not give the impression we're extending the SVE intrinsics for types they're not intended to support.

================
Comment at: llvm/test/CodeGen/AArch64/sve-intrinsics-pred-creation.ll:48
+; CHECK: ptrue p0.d
+; CHECK-NEXT: punpklo p0.h, p0.b
+; CHECK-NEXT: ret
----------------
efriedma wrote:
> sdesmalen wrote:
> > efriedma wrote:
> > > sdesmalen wrote:
> > > > efriedma wrote:
> > > > > I don't think, in general, we guarantee that padding bits of SVE predicates are zeroed.  So the punpklo should be unnecessary.
> > > > I wasn't sure about this to be honest. If we can assume that the other lanes are always zeroed, then we can avoid having to always unpack the predicate when using this value (rather than when creating it). For example, the patterns for `AArch64ptest` don't first mask the predicate with a `ptrue <required element type>` and the instruction always tests each bit. That made me question if it was safe to assume the other lanes are always known to be zeroed.
> > > As far as I know, nothing has changed since we last discussed this in https://reviews.llvm.org/D74471 .  Unless I'm forgetting something...
> > You're right, in general the other lanes in the vector are indeed undef.
> > 
> > For the ptrue/splat(1) case I think we want to make an exception and make sure we generate a predicate where the other lanes are zeroed, such that we can use this value to AND with other predicates for explicitly zeroing the other lanes.
> I agree we need some way to generate a predicate with the other lanes zeroed.
> 
> I'm not sure llvm.aarch64.sve.ptrue.nxv1i1 should map to that; if we convert.to.svbool or whatever, that should zero out the other bits anyway.  But maybe it's less confusing this way if you have optimizations that specifically check for "ptrue".
I think the confusion here is that the test gives the impression we're extending the intrinsic whereas it's really just a route to exercise some code generation logic.  However, for ISD::PTRUE we do rely on the implicit zeroing and so to me it makes sense for the emulated PTRUE.Q to honour the same commitment.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128665/new/

https://reviews.llvm.org/D128665