[PATCH] D100435: [ARM] Transforming memset to Tail predicated Loop

Mon Apr 19 02:06:15 PDT 2021

dmgreen added inline comments.

================
Comment at: llvm/lib/Target/ARM/ARMISelLowering.cpp:11214
+/// loops.
+static void genTPLoopBodyMemset(MachineBasicBlock *TpLoopBody,
+                          MachineBasicBlock *TpEntry, MachineBasicBlock *TpExit,
----------------
This and genTPLoopBodyMemcpy are very similar. Is it possible to combine them more?

================
Comment at: llvm/lib/Target/ARM/ARMISelLowering.cpp:11215
+static void genTPLoopBodyMemset(MachineBasicBlock *TpLoopBody,
+                          MachineBasicBlock *TpEntry, MachineBasicBlock *TpExit,
+                          const TargetInstrInfo *TII, DebugLoc Dl,
----------------
Formatting.

================
Comment at: llvm/lib/Target/ARM/ARMSelectionDAGInfo.cpp:24
+                                "Tail predicated loops (WLSTP)")),
+    EnableMemsetTPLoop("arm-memset-tploop", cl::Hidden,
+                       cl::desc("Enable/disable conversion of llvm.memset to "
----------------
One option for both is probably fine.

================
Comment at: llvm/lib/Target/ARM/ARMSelectionDAGInfo.cpp:294
+    if ((EnableMemsetTPLoop == cl::BOU_TRUE) || (EnableMemsetTPLoop == cl::BOU_UNSET &&
+        !DAG.getMachineFunction().getFunction().hasOptNone())) {
+      Src = DAG.getZExtOrTrunc(Src, dl, MVT::i32);
----------------
OptSize?

================
Comment at: llvm/lib/Target/ARM/ARMSelectionDAGInfo.cpp:296
+      Src = DAG.getZExtOrTrunc(Src, dl, MVT::i32);
+      Src = DAG.getNode(ARMISD::VDUP, dl, DAG.getVTList(MVT::v16i8), Src);
+      return DAG.getNode(ARMISD::MEMSETLOOP, dl, MVT::Other, Chain, Dst, Src.getValue(0),
----------------
It's best to create a shuffle vector or build vector, not a ARMISD::VDUP directly. That may optimize better in places.

Is the input always an i8?

================
Comment at: llvm/lib/Target/ARM/ARMSelectionDAGInfo.cpp:297
+      Src = DAG.getNode(ARMISD::VDUP, dl, DAG.getVTList(MVT::v16i8), Src);
+      return DAG.getNode(ARMISD::MEMSETLOOP, dl, MVT::Other, Chain, Dst, Src.getValue(0),
+                         DAG.getZExtOrTrunc(Size, dl, MVT::i32));
----------------
Does it need Src.getValue(0)?

================
Comment at: llvm/test/CodeGen/Thumb2/mve-phireg.ll:206
+; CHECK-NEXT:  .LBB1_1: @ =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:    vldrw.u32 q0, [sp] @ 16-byte Reload
+; CHECK-NEXT:    vstrb.8 q0, [r2], #16
----------------
malharJ wrote:
> @dmgreen , 
> 
> This seems to be happening because of a 16-byte spill ( line 169 ).
> 
> And it's being generated after I updated the code to create the VDUP in
>  EmitTargetCodeForMemset() instead of during the MIR level transform.
> 
> I'm not sure why the spill is happening (not really familiar with RA), but do you
>  think it's worth investigating ? It doesnt seem to happen in the simpler tests. 
>  
Yeah, this test might be difficult like that, it's designed to spill a lot. It's probably fine so long as we don't see it happening in other places.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100435/new/

https://reviews.llvm.org/D100435