[PATCH] D100435: [ARM] Transforming memset to Tail predicated Loop
Dave Green via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sun Apr 25 00:19:08 PDT 2021
dmgreen added inline comments.
Herald added a subscriber: tmatheson.
================
Comment at: llvm/lib/Target/ARM/ARMSelectionDAGInfo.cpp:132
+static bool genInlineTPLoop(const ARMSubtarget &Subtarget, const SelectionDAG &DAG,
+ ConstantSDNode *ConstantSize, Align Alignment, bool IsMemcpy){
----------------
Maybe shouldGenerateInlineTPLoop would be a more descriptive name?
================
Comment at: llvm/lib/Target/ARM/ARMSelectionDAGInfo.cpp:144
+ // If cli option is unset, for memset always generate inline TP
+ // for memcpy, check some conditions
+ if (!IsMemcpy)
----------------
for -> For, probably with a full stop on the previous line.
================
Comment at: llvm/lib/Target/ARM/ARMSelectionDAGInfo.cpp:149
+ return true;
+ if (ConstantSize &&
+ ConstantSize->getZExtValue() > Subtarget.getMaxInlineSizeThreshold() &&
----------------
Make sure you clang-format the patch.
================
Comment at: llvm/lib/Target/ARM/ARMSelectionDAGInfo.cpp:296
+ Src = DAG.getZExtOrTrunc(Src, dl, MVT::i32);
+ Src = DAG.getNode(ARMISD::VDUP, dl, DAG.getVTList(MVT::v16i8), Src);
+ return DAG.getNode(ARMISD::MEMSETLOOP, dl, MVT::Other, Chain, Dst, Src.getValue(0),
----------------
malharJ wrote:
> dmgreen wrote:
> > It's best to create a shuffle vector or build vector, not a ARMISD::VDUP directly. That may optimize better in places.
> >
> > Is the input always an i8?
> I've made this update with a build vector. Have 2 minor queries:
>
> - Why would shuffle vector be appropriate here (given that all we want to create is a vector of constants) ?
>
> - Even though I'm not utilising vdup directly, just to understand better, why would the input need to be i8 to generate a v16i8 vector ... can the vector not be generated by a i32 source register ?
Do you mean "Why use a shuffle as opposed to a ARMISD::VDUP"? Normal codegen will usually start off by producing a shuffle, which will then be optimized and eventually turned into a VDUP. Because it works that way around there are not a lot of optimizations on VDUP directly, as can be seen with the constant becoming a VMOVimm. We could theoretically add them, but its added complexity and it's easier to just use the optimizations that are already present by starting from a shuffle.
The input should be able to be an i32, but the types would need to be correct.
Do we know the type of the value is always an i8?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D100435/new/
https://reviews.llvm.org/D100435
More information about the llvm-commits
mailing list