[PATCH] D123638: [SLP][AArch64] Implement lookahead operand reordering score of splat loads for AArch64

Dave Green via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Apr 21 06:47:15 PDT 2022


dmgreen added inline comments.


================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2616
+      if (IsLoad) {
+        assert(isLegalBroadcastLoad(Tp->getElementType(),
+                                    LT.second.getVectorElementCount()) &&
----------------
vporpo wrote:
> dmgreen wrote:
> > vporpo wrote:
> > > dmgreen wrote:
> > > > This assert looks like it will fire a lot of the time. Should we use `IsLoad && isLegalBroadcastLoad(...)`?
> > > > 
> > > > That might make this testable from the cost model too, Even if it's slightly unorthodox to use a vector load for such cases.
> > > It is already guarded by `if (IsLoad)` so I guess `IsLoad && ` is not needed.
> > > 
> > > > That might make this testable from the cost model too, Even if it's slightly unorthodox to use a vector load for such cases.
> > > I am not sure I understand. Is this about the assertion?
> > I meant remove the assert and turn it into a condition. This function might be checking Types which we do not consider to be legal. https://godbolt.org/z/dv56WMaP7 has an example (apparently :) ).
> > 
> > We could presumably have a test in llvm/test/Analysis/CostModel/AArch64 that tests `load <vector>, splat-shuffle`, and it will trigger this and be treated as free?
> Oops, yes this is totally broken, sorry about that. I had just copied this part of the code from X86 TTI where we have a cost table that checks the types but I skipped the table. This assertion is meant to check that the table and `isLegalBroadcastLoad` are in sync. 
> To fix this I can either add a cost table like this:
> ```
>   static const CostTblEntry NeonBroadcastLoadTbl[] = { 
>      {TTI::SK_Broadcast, MVT::v8i8, 0 },
>      {TTI::SK_Broadcast, MVT::v16i8, 0 },
>      {TTI::SK_Broadcast, MVT::v4i16, 0 },
>      {TTI::SK_Broadcast, MVT::v8i16, 0 },
>      {TTI::SK_Broadcast, MVT::v2i32, 0 },
>      {TTI::SK_Broadcast, MVT::v4i32, 0 },
>      {TTI::SK_Broadcast, MVT::v2f64, 0},
>      {TTI::SK_Broadcast, MVT::v4f32, 0 },
>   }; 
> ```
> using a similar logic as in X86TargetTransformInfo.cpp:1558, or rely on an `if (isLegalBroadcastLoad())`. 
> I think adding the table makes it a bit more explicit, and I would prefer it. What do you think? 
> 
> Btw would you happen to know if a `v2f32` broadcast is handled by `ld1r` in neon? I think this is the only 64-bit entry missing from the table.
v2f32 should work OK - there is nothing very different between a v2f32 and a v2i32 load. There are also fp16 types and possibly bf16 types (but they might not work at the moment).

My slight preference would probably be towards re-using isLegalBroadcastLoad because they are then insync by design and we don't need to repeat the logic. Up to you though.


================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2703
+    if (Kind == TTI::SK_Broadcast) {
+      bool IsLoad = !Args.empty() && llvm::all_of(Args, [](const Value *V) {
+        return isa<LoadInst>(V);
----------------
vporpo wrote:
> dmgreen wrote:
> > dmgreen wrote:
> > > vporpo wrote:
> > > > dmgreen wrote:
> > > > > I would expect it to have one Arg for a splat.
> > > > > 
> > > > > I guess this would not match the canonical representation for a splat load too, except from where it is being created from scalar code. Not sure what to do about that, but it seems unfortunate that the cost will go up after it has been vectorized.
> > > > > ```
> > > > >   %l = load i8, i8 *%p
> > > > >   %i = insertelement <16 x i8> poison, i8 %l, i32 0
> > > > >   %s = shufflevector <16 x i8> %i, <16 x i8> poison, <16 x i32> zeroinitializer
> > > > > ```
> > > > Yes, I think this won't be matched.  `Args` is meant to be used for scalar code, but that's a great point, we could improve this in a future patch.
> > > Three instructions patterns are tough to cost-model nicely is llvm at the moment. It needs too much code to check down through uses and up through operands.
> > > 
> > > The first part of my comment was meaning that Args should only have 1 element for a splat.
> > > ```
> > >  IsLoad = Args.size() == 1 && isa<LoadInstr>(Args[0]);
> > > ```
> > > Do we expect it get called when there are multiple loads too?
> > Do we need to check more than 1 arg?
> Passing the whole vector seems a bit more natural from the SLP point of view. But yes, I agree, a single argument when `Kind == TTI::SK_Broadcast` would probably make more sense from the TTI point of view. This needs changes in the x86 side too, so I will update it in a separate patch.
I was expecting the operands to be the two inputs of the shuffle (or the scalar equivalents that would become those operands). But if it doesn't work that way, then it sounds OK to check them all.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123638/new/

https://reviews.llvm.org/D123638



More information about the llvm-commits mailing list