[PATCH] D52653: [CodeGen, AArch64] Combine Interleaved Loads which are not covered by the Vectorizer

John Brawn via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Oct 12 07:04:32 PDT 2018


john.brawn added inline comments.


================
Comment at: lib/CodeGen/InterleavedLoadCombinePass.cpp:112
+  /// Dominator Tree Analysis
+  DominatorTree *DT = nullptr;
+
----------------
I don't think any of the dominator checks in this file are needed, so this isn't needed.


================
Comment at: lib/CodeGen/InterleavedLoadCombinePass.cpp:225-226
+
+  /// Value
+  Value *V;
+
----------------
I don't think this is needed. hasB() can check !B.empty(), and in isCompatibleTo() we can assume that the user of this class will be using polynomials of the same variable.


================
Comment at: lib/CodeGen/InterleavedLoadCombinePass.cpp:679
+  /// Information of a Vector Element
+  struct ElementInfo {
+    /// Offset Polynomial.
----------------
LI is used only for the InsertionPoint in combine, and you can find that by picking the least element of VectorInfo::LIs e.g.

```
  LoadInst *getFirstLoad() const { return *std::min_element(LIs.begin(), LIs.end()); }
```
so LI isn't needed here, so we don't need ElementInfo at all and can just have an array of Polynomial.


================
Comment at: lib/CodeGen/InterleavedLoadCombinePass.cpp:787
+    Result.Is.insert(Old.Is.begin(), Old.Is.end());
+    Result.Is.insert(BCI);
+    Result.SVI = nullptr;
----------------
Duplicates line 782.


================
Comment at: lib/CodeGen/InterleavedLoadCombinePass.cpp:1137
+LoadInst *
+InterleavedLoadCombine::findDominator(const std::set<LoadInst *> &LIs) {
+  for (auto &Cand : LIs) {
----------------
All loads are required to be in the same block, so the dominator will just be the first load.


================
Comment at: lib/CodeGen/InterleavedLoadCombinePass.cpp:1229
+  for (auto &VI : InterleavedLoad) {
+    if (!DT->dominates(InsertionPoint, VI->SVI))
+      return false;
----------------
This isn't needed, as if this weren't the case the IR wouldn't be valid in the first place (I'm pretty sure).


================
Comment at: lib/CodeGen/InterleavedLoadCombinePass.cpp:1292-1294
+        if (isa<ShuffleVectorInst>(&I)) {
+
+          auto SVI = cast<ShuffleVectorInst>(&I);
----------------
You can do this all at once with

```
if (auto SVI = dyn_cast<ShuffleVectorInst>(&I))
```


================
Comment at: lib/CodeGen/InterleavedLoadCombinePass.cpp:1295-1296
+          auto SVI = cast<ShuffleVectorInst>(&I);
+          if (DL.getTypeAllocSize(SVI->getType()) % VectorWidth)
+            continue;
+
----------------
Having a predetermined width multiple and using that as the basis for when to try this doesn't seem right. It would make more sense to use TTI->getInterleavedMemoryOpCost and check if we expect the cost to be lower than not doing the transform. I tried

```
          VectorType *Ty = SVI->getType();
          SmallVector<uint32_t, 4> Mask;
          for (unsigned i = 0; i < Factor; i++)
             Mask.push_back(i);

          VectorType *InterleaveTy = VectorType::get(Ty->getScalarType(), Ty->getNumElements() * Factor);
          int InterleavedCost = TTI->getInterleavedMemoryOpCost(Instruction::Load, InterleaveTy,
                                                                Factor, Mask, 16, 0); // TODO: correct alignment, address space
          int MemoryOpCost = TTI->getMemoryOpCost(Instruction::Load, Ty, 16, 0);
          int ShuffleCost = TTI->getInstructionCost(SVI, TargetTransformInfo::TCK_Latency);
          if (InterleavedCost > Factor * (MemoryOpCost + ShuffleCost)) {
            continue;
          }
```
and it seemed like it worked, though maybe here is the wrong place to do this check and maybe it should be in InterleavedLoadCombine::combine.


================
Comment at: lib/CodeGen/InterleavedLoadCombinePass.cpp:1303-1307
+          if (!VI->SVI)
+            continue;
+
+          if (!VI->PV)
+            continue;
----------------
SVI and PV should never be null if compute returned true, so these should be asserts that SVI and PV are non-null.


================
Comment at: tools/opt/opt.cpp:466
   initializeIndirectBrExpandPassPass(Registry);
+  initializeInterleavedLoadCombinePass(Registry);
   initializeInterleavedAccessPass(Registry);
----------------
Do we also need a similar call in initializeCodeGen in lib/CodeGen/CodeGen.cpp?


Repository:
  rL LLVM

https://reviews.llvm.org/D52653





More information about the llvm-commits mailing list