[PATCH] D144445: [AggressiveInstCombine] folding load for constant global patterened arrays and structs

Sun Mar 19 07:54:18 PDT 2023

nikic requested changes to this revision.
nikic added a comment.
This revision now requires changes to proceed.

Extracting the stride from GEPs turned out to be trickier than I expected...

================
Comment at: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp:833
+  auto *LI = dyn_cast<LoadInst>(&I);
+  if (!LI || LI->isVolatile())
+    return false;
----------------
Should add a negative test using volatile.

================
Comment at: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp:844
+  // if Global Variable's alignment is less than load's alignment
+  // then we can't compute valid offsets.
+  Type *LoadTy = LI->getType();
----------------
This comment looks out of place, should be further down?

================
Comment at: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp:848
+
+  // Bail for large initializers in excess of 64K to avoid allocating
+  // too much memory.
----------------
64K -> 4K

================
Comment at: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp:850
+  // too much memory.
+  unsigned GVSize = DL.getTypeAllocSize(C->getType());
+  if (!GVSize || 4096 < GVSize)
----------------
`uint64_t`

================
Comment at: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp:856
+  uint64_t Stride = 1;
+  APInt ConstOffset(GVSize * 8, 0);
+  if (auto GEP = dyn_cast<GEPOperator>(PtrOp)) {
----------------
This is again creating a very, very wide integer :) The correct size to use here is `DL.getIndexTypeSizeInBits(PtrOp->getType())` as you already do in the loop. In that case you also don't need the separate `COff` variable, you can let collectOffset() directly add onto this one.

================
Comment at: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp:858
+  if (auto GEP = dyn_cast<GEPOperator>(PtrOp)) {
+    Stride = DL.getTypeAllocSize(GEP->getSourceElementType());
+    Value *PtrOpV = GEP;
----------------
This looks somewhat dubious -- I assume the intention here is that this is a "no-op" value for the first GreatestCommonDivisor call?

I think it would be clearer to make `Stride` an `std::optional` and then set it the first time, and use GCD for later iterations. (After the loop, if `Stride` is not set, it's okay to just bail out: The constant-offset case is already handled by other code, so we needn't bother with it.)

================
Comment at: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp:881
+
+    // In consideration of signed GEP indeces, non-negligible offset become
+    // remainder of divission by minimum GEP stride.
----------------
indeces -> indices

================
Comment at: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp:883
+    // remainder of divission by minimum GEP stride.
+    ConstOffset = ConstOffset.urem(APInt(ConstOffset.getBitWidth(), Stride));
+  }
----------------
Hm, what happens if ConstOffset itself is negative? Would urem produce correct results in that case? Let's say `ConstOffset = -3` and `Stride = 3`. Let's assume 8-bit address space to keep things simple, then `ConstOffset = 253` in unsigned interpretation and `253 % 3 == 1`.

Also I just realized another problem here: Just using the GCD isn't right if the GEP is not `inbounds` (i.e. multiplication can overflow) and the stride is not a power of two.

I believe the correct way to handle these cases is implemented in BasicAA, in particular https://github.com/llvm/llvm-project/blob/47aa1fe376a477939a1991ffe37504124af25f52/llvm/lib/Analysis/BasicAliasAnalysis.cpp#L1135-L1138 to only keep a power of two factor for non-inbounds and https://github.com/llvm/llvm-project/blob/47aa1fe376a477939a1991ffe37504124af25f52/llvm/lib/Analysis/BasicAliasAnalysis.cpp#L1168-L1170 for the final modulus.

================
Comment at: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp:884
+    ConstOffset = ConstOffset.urem(APInt(ConstOffset.getBitWidth(), Stride));
+  }
+
----------------
A correctness check missing here is that after looking through all the GEPs, you must arrive back at `GV`. Otherwise there might be some kind of other operation sitting in between which preserves the underlying object, but where we can't determine the offset. As a test case, one could use a call to `llvm.ptrmask` for example.

================
Comment at: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp:893
+  // one is sufficient to say results' equality.
+  if (LI->getAlign().value() <= GV->getAlign().valueOrOne().value())
+    Stride = std::max(Stride, LI->getAlign().value());
----------------
This probably also needs to check that ConstOffset is zero? I don't think using load alignment as stride is right with a non-zero start offset.

================
Comment at: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp:897
+  unsigned E = GVSize - DL.getTypeAllocSize(LoadTy);
+  for (uint64_t ByteOffset = ConstOffset.getZExtValue() + Stride;
+       ByteOffset <= E; ByteOffset += Stride)
----------------
It would be a bit simpler to keep using (and adding to) ConstOffset here, as we need an APInt anyway.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144445/new/

https://reviews.llvm.org/D144445