[PATCH] D144445: [AggressiveInstCombine] folding load for constant global patterened arrays and structs

Mon Mar 20 00:09:08 PDT 2023

khei4 planned changes to this revision.
khei4 added a comment.

Thank you for a lot of good catches!

In D144445#4204789 <https://reviews.llvm.org/D144445#4204789>, @nikic wrote:

> In D144445#4204759 <https://reviews.llvm.org/D144445#4204759>, @nikic wrote:
>
>> Extracting the stride from GEPs turned out to be trickier than I expected...
>
> If you like, we can go back to just the alignment handling for this patch, and then add the GEP handling in a separate one on top. Either way works for me.

Sounds good to me! 
Then I want to separate this into following

1. alignment-based (This revision will be)
2. (all) inbounds GEP index-based
3. non-inbounds included GEP index-based

TBH, I feel non-inbounds GEP handling is a little bit handful, but I'll read AA's implementation and try it first!

================
Comment at: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp:856
+  uint64_t Stride = 1;
+  APInt ConstOffset(GVSize * 8, 0);
+  if (auto GEP = dyn_cast<GEPOperator>(PtrOp)) {
----------------
nikic wrote:
> This is again creating a very, very wide integer :) The correct size to use here is `DL.getIndexTypeSizeInBits(PtrOp->getType())` as you already do in the loop. In that case you also don't need the separate `COff` variable, you can let collectOffset() directly add onto this one.
Oh... I was confused with offset and the GV byte sizes! You're right!

================
Comment at: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp:858
+  if (auto GEP = dyn_cast<GEPOperator>(PtrOp)) {
+    Stride = DL.getTypeAllocSize(GEP->getSourceElementType());
+    Value *PtrOpV = GEP;
----------------
nikic wrote:
> This looks somewhat dubious -- I assume the intention here is that this is a "no-op" value for the first GreatestCommonDivisor call?
> 
> I think it would be clearer to make `Stride` an `std::optional` and then set it the first time, and use GCD for later iterations. (After the loop, if `Stride` is not set, it's okay to just bail out: The constant-offset case is already handled by other code, so we needn't bother with it.)
> This looks somewhat dubious -- I assume the intention here is that this is a "no-op" value for the first GreatestCommonDivisor call?

Yes! 

> I think it would be clearer to make `Stride` a `std::optional` and then set it the first time, 

Sounds reasonable, I'll use that!

================
Comment at: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp:883
+    // remainder of divission by minimum GEP stride.
+    ConstOffset = ConstOffset.urem(APInt(ConstOffset.getBitWidth(), Stride));
+  }
----------------
nikic wrote:
> Hm, what happens if ConstOffset itself is negative? Would urem produce correct results in that case? Let's say `ConstOffset = -3` and `Stride = 3`. Let's assume 8-bit address space to keep things simple, then `ConstOffset = 253` in unsigned interpretation and `253 % 3 == 1`.
> 
> Also I just realized another problem here: Just using the GCD isn't right if the GEP is not `inbounds` (i.e. multiplication can overflow) and the stride is not a power of two.
> 
> I believe the correct way to handle these cases is implemented in BasicAA, in particular https://github.com/llvm/llvm-project/blob/47aa1fe376a477939a1991ffe37504124af25f52/llvm/lib/Analysis/BasicAliasAnalysis.cpp#L1135-L1138 to only keep a power of two factor for non-inbounds and https://github.com/llvm/llvm-project/blob/47aa1fe376a477939a1991ffe37504124af25f52/llvm/lib/Analysis/BasicAliasAnalysis.cpp#L1168-L1170 for the final modulus.
Ah, I mistakenly assumed `inbounds`! Without `inbounds`, ConstantOffset could be negative and could overflow! 

================
Comment at: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp:884
+    ConstOffset = ConstOffset.urem(APInt(ConstOffset.getBitWidth(), Stride));
+  }
+
----------------
nikic wrote:
> A correctness check missing here is that after looking through all the GEPs, you must arrive back at `GV`. Otherwise there might be some kind of other operation sitting in between which preserves the underlying object, but where we can't determine the offset. As a test case, one could use a call to `llvm.ptrmask` for example.
Good catch! I haven't known underlying-object preserved operations! I'll add tests!

================
Comment at: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp:893
+  // one is sufficient to say results' equality.
+  if (LI->getAlign().value() <= GV->getAlign().valueOrOne().value())
+    Stride = std::max(Stride, LI->getAlign().value());
----------------
nikic wrote:
> This probably also needs to check that ConstOffset is zero? I don't think using load alignment as stride is right with a non-zero start offset.
Good catch! Yeah, we should have considered GEP-stride-based and alignment-based separately also for ConstOffset.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144445/new/

https://reviews.llvm.org/D144445