[PATCH] D100435: [ARM] Transforming memset to Tail predicated Loop
Dave Green via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Apr 19 02:06:15 PDT 2021
dmgreen added inline comments.
================
Comment at: llvm/lib/Target/ARM/ARMISelLowering.cpp:11214
+/// loops.
+static void genTPLoopBodyMemset(MachineBasicBlock *TpLoopBody,
+ MachineBasicBlock *TpEntry, MachineBasicBlock *TpExit,
----------------
This and genTPLoopBodyMemcpy are very similar. Is it possible to combine them more?
================
Comment at: llvm/lib/Target/ARM/ARMISelLowering.cpp:11215
+static void genTPLoopBodyMemset(MachineBasicBlock *TpLoopBody,
+ MachineBasicBlock *TpEntry, MachineBasicBlock *TpExit,
+ const TargetInstrInfo *TII, DebugLoc Dl,
----------------
Formatting.
================
Comment at: llvm/lib/Target/ARM/ARMSelectionDAGInfo.cpp:24
+ "Tail predicated loops (WLSTP)")),
+ EnableMemsetTPLoop("arm-memset-tploop", cl::Hidden,
+ cl::desc("Enable/disable conversion of llvm.memset to "
----------------
One option for both is probably fine.
================
Comment at: llvm/lib/Target/ARM/ARMSelectionDAGInfo.cpp:294
+ if ((EnableMemsetTPLoop == cl::BOU_TRUE) || (EnableMemsetTPLoop == cl::BOU_UNSET &&
+ !DAG.getMachineFunction().getFunction().hasOptNone())) {
+ Src = DAG.getZExtOrTrunc(Src, dl, MVT::i32);
----------------
OptSize?
================
Comment at: llvm/lib/Target/ARM/ARMSelectionDAGInfo.cpp:296
+ Src = DAG.getZExtOrTrunc(Src, dl, MVT::i32);
+ Src = DAG.getNode(ARMISD::VDUP, dl, DAG.getVTList(MVT::v16i8), Src);
+ return DAG.getNode(ARMISD::MEMSETLOOP, dl, MVT::Other, Chain, Dst, Src.getValue(0),
----------------
It's best to create a shuffle vector or build vector, not a ARMISD::VDUP directly. That may optimize better in places.
Is the input always an i8?
================
Comment at: llvm/lib/Target/ARM/ARMSelectionDAGInfo.cpp:297
+ Src = DAG.getNode(ARMISD::VDUP, dl, DAG.getVTList(MVT::v16i8), Src);
+ return DAG.getNode(ARMISD::MEMSETLOOP, dl, MVT::Other, Chain, Dst, Src.getValue(0),
+ DAG.getZExtOrTrunc(Size, dl, MVT::i32));
----------------
Does it need Src.getValue(0)?
================
Comment at: llvm/test/CodeGen/Thumb2/mve-phireg.ll:206
+; CHECK-NEXT: .LBB1_1: @ =>This Inner Loop Header: Depth=1
+; CHECK-NEXT: vldrw.u32 q0, [sp] @ 16-byte Reload
+; CHECK-NEXT: vstrb.8 q0, [r2], #16
----------------
malharJ wrote:
> @dmgreen ,
>
> This seems to be happening because of a 16-byte spill ( line 169 ).
>
> And it's being generated after I updated the code to create the VDUP in
> EmitTargetCodeForMemset() instead of during the MIR level transform.
>
> I'm not sure why the spill is happening (not really familiar with RA), but do you
> think it's worth investigating ? It doesnt seem to happen in the simpler tests.
>
Yeah, this test might be difficult like that, it's designed to spill a lot. It's probably fine so long as we don't see it happening in other places.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D100435/new/
https://reviews.llvm.org/D100435
More information about the llvm-commits
mailing list