[PATCH] D128302: [AArch64][CostModel] Detects that ExtractElement at index is not free in AArch64 when result is used as integer.

Simon Pilgrim via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Aug 24 05:48:05 PDT 2022


RKSimon added inline comments.


================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1993
+    //
+    // The cost of extracting a scalar element from a vector register depends
+    // on how scalar will be used:
----------------
mingmingl wrote:
> fhahn wrote:
> > I might be missing something here, but the comment here doesn't seem to match the code, which just checks if the result is used. I'd expect that in all interesting cases the result will be used. And if it is not used the cost is not really that interesting, as it is effectively a no-op that should be removed.
> It makes complete sense that instructions without users are not interesting, and [1] is sufficient for this patch (will hold on updating diff for better context). Let me elaborate why this patch and it'd be great to hear your thoughts!  //(This is a lot of words despite efforts to simplify them.. hopefully structure / bold fonts makes it easier. Thanks for reviewing this patch and reading the replies!) //
> 
> About the motivating use case
> 
> - It is shown in `LICM/AArch64/extract-element.ll` -> an extract-element instruction shouldn't be considered free (and thereby [[ https://github.com/llvm/llvm-project/blob/a4230319f7af786f4d947f39a870f339b32a6fe7/llvm/lib/Transforms/Scalar/LICM.cpp#L575-L580 | sinked to loop exit ]]) by LICM when cost kind is `TCK_SizeAndLatency` -> extract-element has size and latency cost, unless it could be "combined" into its user instructions (e.g., user instruction can access lane directly). 
> - Say the loop-exit is frequently executed in a hot path, sink `extract-element` (along with a series of its def instructions) is inefficient. 
> - Redundancy elimination might be an alternative option but I haven't got a chance to look into the alternative option yet, with the thought that fixing cost model is overall a good change as well :)*
> 
> About the motivation of this patch
> * When the instruction exists in IR (indicated by function parameter `const Instruction& I`) and extract result is integer (a stronger signal of a `fmov` being emitted), this patch use the same cost all lanes (not considering lane 0 free). 
> 
> So why this patch wants to tell a real vector instruction from a virtual one (by asking for `const Instruction&I`, not just `Opcode`)?
> * Vectorizer passes have `Opcode` but not real instructions in its provisioning stage [2]. Not telling real from virtual may make vectorizer result better or worse in a fragile way. 
> * Plus, a new interface of `getVectorInstCost` won't mess up the codebase (w.r.t API/code style) when improvements for vectorizer cost models are added in the future.
> 
> But why vectorizer gives better or worse result in a fragile way?
> -  Take `BaseTTIImpl::getScalarizationOverhead` as an example, it calls `getVectorInstrCost` [[ https://github.com/llvm/llvm-project/blob/515ece1a9005b29747e5e19bccb17f2ae3d78fb9/llvm/include/llvm/CodeGen/BasicTTIImpl.h#L722-L727 | element-by-element ]] for virtual instructions. **//With various SIMD instructions, element-by-element estimation is not accurate for AArch64//**.
> - AArch64TTI doesn't override this method so `BaseTTIImpl` is used. Just as a reference, x86 backend overrides this method for some specialized cases.
> - If this patch is //NOT// no-op for method `getScalarizationOverhead`, it could exacerbates the inaccurate cost model (SLP, etc) and misleads vectorizers.
> - **In practice, when non-zero cost for lane 0 is applied for "provisioned" vector instructions, some IR from `llvm/test/Transforms/<vectorizer-subdir>/AArch64/` gets better generated code and some gets worse generated code**. I didn't inspect them all, but presumably there are going to be better or worse cases, since without this change vectorizers sometimes gives worse output (e.g., https://godbolt.org/z/c1bjhGdT5 is for llvm/test/Transforms/SLPVectorizer/AArch64/ext-trunc.ll)
> 
> A related question, I wonder if it's with a reason (or just unintended status quo) that `getVectorInstrCost` interface doesn't ask for a `CostKind` parameter? 
> 
> [1]
> 
> ```
> +InstructionCost AArch64TTIImpl::getVectorInstrCost(const Instruction &I,
> +                                                   Type *Val, unsigned Index) {
> +  InstructionCost cost = getVectorInstrCost(I.getOpcode(), Val, Index);
> +
> +  // According to NEON programmer guide, other than multiply instructions,
> +  // instructions that access scalars can access any element in the vector register.
> +  //
> +  // The cost of an extract depends on whether the user instruction.
> +  // 1. If users could use scalars in vector registers directly (e.g., store), the
> +  // extract-element operation is essentially free.
> +  // 2. If the user instruction requires core register as operand (i.e.,
> +  // cannot use scalars in vector register), an explicit move operation will
> +  // be codegen'd.
> +  //
> +  // 'cost' might be an optimistic 0 when lane is 0.
> +  // Returns the base cost (as the same with other lanes) if the scalar is an integer.
> +  //
> +  // FIXME:
> +  // The cost of extract-element from vector is better estimated with context (e.g. users).
> +  if (!isa_and_nonnull<ExtractElementInst>(&I) || !Val->getScalarType()->isIntegerTy()) 
> +    return ST->getVectorInsertExtractBaseCost();
> +  return cost;
> +}
> +
> ```
> [2]  Basically, vectorizer pass sees one form (scalar or vector) of "physical" instructions (that do exists in the IR), and tries to "provision" the cost of the other form; this means instructions of the other form doesn't exist yet, and only Opcode is known and used to provision the cost.
> A related question, I wonder if it's with a reason (or just unintended status quo) that getVectorInstrCost interface doesn't ask for a CostKind parameter?

This is on my TODO list that was waiting for the getInstructionCost refactor (D79483) - but I want to get some other CostKind changes (notably D132216) out of the way first.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128302/new/

https://reviews.llvm.org/D128302



More information about the llvm-commits mailing list