[PATCH] D144445: [AggressiveInstCombine] folding load for constant global patterened arrays and structs
Nikita Popov via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sun Mar 19 07:54:18 PDT 2023
nikic requested changes to this revision.
nikic added a comment.
This revision now requires changes to proceed.
Extracting the stride from GEPs turned out to be trickier than I expected...
================
Comment at: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp:833
+ auto *LI = dyn_cast<LoadInst>(&I);
+ if (!LI || LI->isVolatile())
+ return false;
----------------
Should add a negative test using volatile.
================
Comment at: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp:844
+ // if Global Variable's alignment is less than load's alignment
+ // then we can't compute valid offsets.
+ Type *LoadTy = LI->getType();
----------------
This comment looks out of place, should be further down?
================
Comment at: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp:848
+
+ // Bail for large initializers in excess of 64K to avoid allocating
+ // too much memory.
----------------
64K -> 4K
================
Comment at: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp:850
+ // too much memory.
+ unsigned GVSize = DL.getTypeAllocSize(C->getType());
+ if (!GVSize || 4096 < GVSize)
----------------
`uint64_t`
================
Comment at: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp:856
+ uint64_t Stride = 1;
+ APInt ConstOffset(GVSize * 8, 0);
+ if (auto GEP = dyn_cast<GEPOperator>(PtrOp)) {
----------------
This is again creating a very, very wide integer :) The correct size to use here is `DL.getIndexTypeSizeInBits(PtrOp->getType())` as you already do in the loop. In that case you also don't need the separate `COff` variable, you can let collectOffset() directly add onto this one.
================
Comment at: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp:858
+ if (auto GEP = dyn_cast<GEPOperator>(PtrOp)) {
+ Stride = DL.getTypeAllocSize(GEP->getSourceElementType());
+ Value *PtrOpV = GEP;
----------------
This looks somewhat dubious -- I assume the intention here is that this is a "no-op" value for the first GreatestCommonDivisor call?
I think it would be clearer to make `Stride` an `std::optional` and then set it the first time, and use GCD for later iterations. (After the loop, if `Stride` is not set, it's okay to just bail out: The constant-offset case is already handled by other code, so we needn't bother with it.)
================
Comment at: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp:881
+
+ // In consideration of signed GEP indeces, non-negligible offset become
+ // remainder of divission by minimum GEP stride.
----------------
indeces -> indices
================
Comment at: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp:883
+ // remainder of divission by minimum GEP stride.
+ ConstOffset = ConstOffset.urem(APInt(ConstOffset.getBitWidth(), Stride));
+ }
----------------
Hm, what happens if ConstOffset itself is negative? Would urem produce correct results in that case? Let's say `ConstOffset = -3` and `Stride = 3`. Let's assume 8-bit address space to keep things simple, then `ConstOffset = 253` in unsigned interpretation and `253 % 3 == 1`.
Also I just realized another problem here: Just using the GCD isn't right if the GEP is not `inbounds` (i.e. multiplication can overflow) and the stride is not a power of two.
I believe the correct way to handle these cases is implemented in BasicAA, in particular https://github.com/llvm/llvm-project/blob/47aa1fe376a477939a1991ffe37504124af25f52/llvm/lib/Analysis/BasicAliasAnalysis.cpp#L1135-L1138 to only keep a power of two factor for non-inbounds and https://github.com/llvm/llvm-project/blob/47aa1fe376a477939a1991ffe37504124af25f52/llvm/lib/Analysis/BasicAliasAnalysis.cpp#L1168-L1170 for the final modulus.
================
Comment at: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp:884
+ ConstOffset = ConstOffset.urem(APInt(ConstOffset.getBitWidth(), Stride));
+ }
+
----------------
A correctness check missing here is that after looking through all the GEPs, you must arrive back at `GV`. Otherwise there might be some kind of other operation sitting in between which preserves the underlying object, but where we can't determine the offset. As a test case, one could use a call to `llvm.ptrmask` for example.
================
Comment at: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp:893
+ // one is sufficient to say results' equality.
+ if (LI->getAlign().value() <= GV->getAlign().valueOrOne().value())
+ Stride = std::max(Stride, LI->getAlign().value());
----------------
This probably also needs to check that ConstOffset is zero? I don't think using load alignment as stride is right with a non-zero start offset.
================
Comment at: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp:897
+ unsigned E = GVSize - DL.getTypeAllocSize(LoadTy);
+ for (uint64_t ByteOffset = ConstOffset.getZExtValue() + Stride;
+ ByteOffset <= E; ByteOffset += Stride)
----------------
It would be a bit simpler to keep using (and adding to) ConstOffset here, as we need an APInt anyway.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D144445/new/
https://reviews.llvm.org/D144445
More information about the llvm-commits
mailing list