[PATCH] D123638: [SLP][AArch64] Implement lookahead operand reordering score of splat loads for AArch64
Dave Green via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Apr 21 08:35:35 PDT 2022
dmgreen added inline comments.
================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2703
+ if (Kind == TTI::SK_Broadcast) {
+ bool IsLoad = !Args.empty() && llvm::all_of(Args, [](const Value *V) {
+ return isa<LoadInst>(V);
----------------
vporpo wrote:
> dmgreen wrote:
> > vporpo wrote:
> > > dmgreen wrote:
> > > > dmgreen wrote:
> > > > > vporpo wrote:
> > > > > > dmgreen wrote:
> > > > > > > I would expect it to have one Arg for a splat.
> > > > > > >
> > > > > > > I guess this would not match the canonical representation for a splat load too, except from where it is being created from scalar code. Not sure what to do about that, but it seems unfortunate that the cost will go up after it has been vectorized.
> > > > > > > ```
> > > > > > > %l = load i8, i8 *%p
> > > > > > > %i = insertelement <16 x i8> poison, i8 %l, i32 0
> > > > > > > %s = shufflevector <16 x i8> %i, <16 x i8> poison, <16 x i32> zeroinitializer
> > > > > > > ```
> > > > > > Yes, I think this won't be matched. `Args` is meant to be used for scalar code, but that's a great point, we could improve this in a future patch.
> > > > > Three instructions patterns are tough to cost-model nicely is llvm at the moment. It needs too much code to check down through uses and up through operands.
> > > > >
> > > > > The first part of my comment was meaning that Args should only have 1 element for a splat.
> > > > > ```
> > > > > IsLoad = Args.size() == 1 && isa<LoadInstr>(Args[0]);
> > > > > ```
> > > > > Do we expect it get called when there are multiple loads too?
> > > > Do we need to check more than 1 arg?
> > > Passing the whole vector seems a bit more natural from the SLP point of view. But yes, I agree, a single argument when `Kind == TTI::SK_Broadcast` would probably make more sense from the TTI point of view. This needs changes in the x86 side too, so I will update it in a separate patch.
> > I was expecting the operands to be the two inputs of the shuffle (or the scalar equivalents that would become those operands). But if it doesn't work that way, then it sounds OK to check them all.
> Let me rephrase what I mean because I think I misunderstood your comments earlier. I think you raised several issues (please correct me if I am wrong):
> (i) The number of `Args` passed to `getShuffleCost()` and whether we need a second one. Having one `Args` is enough for a splat, but yes I agree, in the general case `getShuffleCost()` should model any type of shuffle, which would require us to have two `Args` like `getShuffleCost(..., Args1, Args2)`, one for each operand. For the splat case we could leave one of them empty.
> (ii) Populating `Args` with only one element in case of a splat. I think this makes sense too.
> (iii) Whether `Args` should be a vector of scalar operands or a single vector operand. Given that we are currently using this only from within SLP, I think we can stick to a vector of scalar operands for now. But I guess it could be useful to support both.
>
> Since these are TTI design issues I think they should be part of a separate patch. I will work on a patch for (i) and (ii).
Yeah it can be separately to this patch. I was expecting it to work like getArithmeticInstrCost which takes an array of the Args representing the operands for the instruction (which from the loop vectorizer will be the scalar instructions that will be converted to vector operands, for example). The getIntrinsicCost does the same with the Args array.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D123638/new/
https://reviews.llvm.org/D123638
More information about the llvm-commits
mailing list