[PATCH] D123638: [SLP][AArch64] Implement lookahead operand reordering score of splat loads for AArch64

Thu Apr 21 08:24:23 PDT 2022

vporpo added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2703
+    if (Kind == TTI::SK_Broadcast) {
+      bool IsLoad = !Args.empty() && llvm::all_of(Args, [](const Value *V) {
+        return isa<LoadInst>(V);
----------------
dmgreen wrote:
> vporpo wrote:
> > dmgreen wrote:
> > > dmgreen wrote:
> > > > vporpo wrote:
> > > > > dmgreen wrote:
> > > > > > I would expect it to have one Arg for a splat.
> > > > > > 
> > > > > > I guess this would not match the canonical representation for a splat load too, except from where it is being created from scalar code. Not sure what to do about that, but it seems unfortunate that the cost will go up after it has been vectorized.
> > > > > > ```
> > > > > >   %l = load i8, i8 *%p
> > > > > >   %i = insertelement <16 x i8> poison, i8 %l, i32 0
> > > > > >   %s = shufflevector <16 x i8> %i, <16 x i8> poison, <16 x i32> zeroinitializer
> > > > > > ```
> > > > > Yes, I think this won't be matched.  `Args` is meant to be used for scalar code, but that's a great point, we could improve this in a future patch.
> > > > Three instructions patterns are tough to cost-model nicely is llvm at the moment. It needs too much code to check down through uses and up through operands.
> > > > 
> > > > The first part of my comment was meaning that Args should only have 1 element for a splat.
> > > > ```
> > > >  IsLoad = Args.size() == 1 && isa<LoadInstr>(Args[0]);
> > > > ```
> > > > Do we expect it get called when there are multiple loads too?
> > > Do we need to check more than 1 arg?
> > Passing the whole vector seems a bit more natural from the SLP point of view. But yes, I agree, a single argument when `Kind == TTI::SK_Broadcast` would probably make more sense from the TTI point of view. This needs changes in the x86 side too, so I will update it in a separate patch.
> I was expecting the operands to be the two inputs of the shuffle (or the scalar equivalents that would become those operands). But if it doesn't work that way, then it sounds OK to check them all.
Let me rephrase what I mean because I think I misunderstood your comments earlier. I think you raised several issues (please correct me if I am wrong):
(i) The number of `Args` passed to `getShuffleCost()` and whether we need a second one. Having one `Args` is enough for a splat, but yes I agree, in the general case `getShuffleCost()` should model any type of shuffle, which would require us to have two `Args` like `getShuffleCost(..., Args1, Args2)`, one for each operand. For the splat case we could leave one of them empty.
(ii) Populating `Args` with only one element in case of a splat. I think this makes sense too.
(iii) Whether `Args` should be a vector of scalar operands or a single vector operand. Given that we are currently using this only from within SLP, I think we can stick to a vector of scalar operands for now. But I guess it could be useful to support both.

Since these are TTI design issues I think they should be part of a separate patch. I will work on a patch for (i) and (ii).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123638/new/

https://reviews.llvm.org/D123638