[llvm] [AMDGPU] Enable i8 GEP promotion for vector allocas (PR #166132)

Wed Dec 10 05:14:40 PST 2025

================
@@ -457,10 +457,25 @@ static Value *GEPToVectorIndex(GetElementPtrInst *GEP, AllocaInst *Alloca,
   const auto &VarOffset = VarOffsets.front();
   APInt OffsetQuot;
   APInt::sdivrem(VarOffset.second, VecElemSize, OffsetQuot, Rem);
-  if (Rem != 0 || OffsetQuot.isZero())
-    return nullptr;
-
   Value *Offset = VarOffset.first;
+  if (Rem != 0) {
+    unsigned ElemSizeShift = Log2_64(VecElemSize);
+    SimplifyQuery SQ(DL);
+    SQ.CxtI = GEP;
+    KnownBits KB = computeKnownBits(VarOffset.first, SQ);
+    // Bail out if the index may point into the middle of an element.
+    if (KB.countMinTrailingZeros() < ElemSizeShift)
+      return nullptr;
+
+    Value *Scaled = Builder.CreateLShr(VarOffset.first, ElemSizeShift);
+    if (Instruction *NewInst = dyn_cast<Instruction>(Scaled))
+      NewInsts.push_back(NewInst);
+
+    Offset = Scaled;
+    OffsetQuot = APInt(BW, 1);
----------------
ruiling wrote:

Sorry look into this very late. This is wrong. The case we want to optimize is when `(VarOffset.first * VarOffset.second) % VecElemSize == 0`, but `VarOffset.second % VecElemSize != 0`. To calculate the vector index, you need `(VarOffset.first * VarOffset.second / VecElemSize)`. Here you reset `OffsetQuot` to one. So you were actually dropping the `VarOffset.second`. I can see the change here will have conflict with #170512. As the code below was mostly moved away. And the argument `NewInsts` was also removed. I would like we do some further refactor based on #170512. Rename the function `GEPToVectorIndex` to `isPtrOffsetAlignedToElementSize()`. And just return the three components if the offset is properly aligned. And do vector index calculation later.

https://github.com/llvm/llvm-project/pull/166132