[PATCH] D100435: [ARM] Transforming memset to Tail predicated Loop

Tue Apr 27 10:32:19 PDT 2021

malharJ added inline comments.

================
Comment at: llvm/lib/Target/ARM/ARMSelectionDAGInfo.cpp:296
+      Src = DAG.getZExtOrTrunc(Src, dl, MVT::i32);
+      Src = DAG.getNode(ARMISD::VDUP, dl, DAG.getVTList(MVT::v16i8), Src);
+      return DAG.getNode(ARMISD::MEMSETLOOP, dl, MVT::Other, Chain, Dst, Src.getValue(0),
----------------
dmgreen wrote:
> malharJ wrote:
> > dmgreen wrote:
> > > malharJ wrote:
> > > > dmgreen wrote:
> > > > > It's best to create a shuffle vector or build vector, not a ARMISD::VDUP directly. That may optimize better in places.
> > > > > 
> > > > > Is the input always an i8?
> > > > I've made this update with a build vector. Have 2 minor queries:
> > > > 
> > > > - Why would shuffle vector be appropriate here (given that all we want to create is a vector of constants) ?
> > > > 
> > > > - Even though I'm not utilising vdup directly, just to understand better, why would the input need to be i8 to generate a v16i8 vector ... can the vector not be generated by a i32 source register ?
> > > Do you mean "Why use a shuffle as opposed to a ARMISD::VDUP"? Normal codegen will usually start off by producing a shuffle, which will then be optimized and eventually turned into a VDUP. Because it works that way around there are not a lot of optimizations on VDUP directly, as can be seen with the constant becoming a VMOVimm. We could theoretically add them, but its added complexity and it's easier to just use the optimizations that are already present by starting from a shuffle.
> > > 
> > > The input should be able to be an i32, but the types would need to be correct.
> > > 
> > > Do we know the type of the value is always an i8?
> > Ok. Thanks for clarifying about the shuffle vector.
> > 
> > > Do we know the type of the value is always an i8?
> > 
> > I have a truncate operation on line 298 which ensures the input is i8.
> Yep, but if the value isn't an i8 it will discard some bits it should not. Something like a `@llvm.memset.p0i8.i32` or `@llvm.memset.p0i32.i32`, if they are valid.
> 
> Is it possible to add an assert at least?
So having a look at the language ref for [[ https://llvm.org/docs/LangRef.html#llvm-memset-intrinsics | llvm.memset ]] suggests that the Src value is always an i8 (It may get zero extended before reaching here but that's not a problem) ..

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100435/new/

https://reviews.llvm.org/D100435