[PATCH] D99719: [SLP] Better estimate cost of no-op extracts on target vectors.

Alexey Bataev via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Apr 1 08:13:38 PDT 2021


ABataev added inline comments.


================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:3495-3498
+        unsigned MaxVecRegSize = getMaxVecRegSize();
+        unsigned EltSize = getVectorElementSize(VL[0]);
+        unsigned EltsPerVector = MaxVecRegSize / EltSize;
+        unsigned Idx = 0;
----------------
fhahn wrote:
> ABataev wrote:
> > fhahn wrote:
> > > ABataev wrote:
> > > > I think it is better to use `TLI->getTypeLegalizationCost(DL, cast<ExtractElementInst>(V)->getVectorOperandType());` to get the real machine vector type and the number of splits.
> > > I think using `TargetLoweringInfo` would indeed be better, but unfortunately I don't think we can access it here, as it is defined in CodeGen? I tried to see if there are any other such uses in `llvm/lib/Transforms` but couldn't. Perhaps there's a way to use it I am missing?
> > You can use `TTI->getNumberOfParts()` to get the number of registers and then calculate EltsPerVector.
> > Also, what if there are extracts from 2 different vectors with the different numbers of elements?
> That's convenient, thanks! I just gave it a try, but I stumbled over a problem. For example, on AArch64, `<2 x i32>` fits and can be used as the lower half of a vector register, so `EltsPerVector` would be 2 (and rightly so). But this has the unfortunate effect that in some cases we would vectorize some operations earlier with `<2 x i32>`, rather than vectorizing a larger expression with `<4 x i32>`. By using the larger vector register, we make sure to only do so to use the largest VF.
> 
> Arguably using `getNumberOfParts` is the right thing to use here, but I really want to avoid introducing any regressions and I don't think there's a way at the moment to skip vectorizing eagerly if it would prevent optimizing with a wider VF later on. WDYT?
> 
> > Also, what if there are extracts from 2 different vectors with the different numbers of elements?
> 
> At the moment all extracts in a block need to have the same vector register, so the types should also be the same. The `extracts_first_2_lanes_different_vectors` test should check for that case.
1. Could you give an example, please?
2. Then maybe guard these extra checks with something like:
```
if (*ShuffleKind == TargetTransformInfo::SK_PermuteSingleSrc) {
 ...
}
```
?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99719/new/

https://reviews.llvm.org/D99719



More information about the llvm-commits mailing list