[PATCH] D127727: [SeparateConstOffsetFromGEPPass] Added optional modification strategy

Sun Aug 7 21:14:11 PDT 2022

mkazantsev added a comment.

Some nits from me. If I may, some advice if you want to make progress here.

First, it seems that some pieces of this patch can be split out as seprate NFC refactorings. If so, please do. It should greatly reduce the code to look at, and smaller patches are generally easier to comprehend and review.

Second, the motivation of this patch is obscure. The structures and algorithms that you are using are not obvious. This either needs a detailed explanation in comments, or maybe incremental approach, when each step is understandable.

Third, the benefit isn't obvious either. In your tests, new IR is bigger than the old IR. This is not necessarily bad, but it requires explanation. If you have some particular scenario where the resulting assembly is better with this change, then it makes sense to provide an LLC test which shows it.

Fourth, this patch claims to improve register pressure. At the same time, it is done in a middle-end pass, which means its impact on register pressure on different platforms might be different. Was it actually tested on other platforms than RICSV? Or you think that the algorithm should be profitable regardless of the platform? Can I see any benefit from this patch in X86 for example?

================
Comment at: llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:203

+#define DEBUG_TYPE "separate-const-offset-from-gep"
+
----------------
Can this go as a separate NFC?

================
Comment at: llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:262
   ///
-  /// \p V            The given expression
-  /// \p SignExtended Whether V will be sign-extended in the computation of the
-  ///                 GEP index
-  /// \p ZeroExtended Whether V will be zero-extended in the computation of the
-  ///                 GEP index
-  /// \p NonNegative  Whether V is guaranteed to be non-negative. For example,
-  ///                 an index of an inbounds GEP is guaranteed to be
-  ///                 non-negative. Levaraging this, we can better split
-  ///                 inbounds GEPs.
-  APInt find(Value *V, bool SignExtended, bool ZeroExtended, bool NonNegative);
+  /// \p V                    The given expression
+  /// \p SignExtended         Whether V will be sign-extended in the computation
----------------
Please commit this reformatting separately.

================
Comment at: llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:271
+  ///                         split inbounds GEPs.
+  /// \p NonConstantBaseValue The second non-constant operand if V is binary
+  ///                         operator.
----------------
\p CheckProfitability  ?..

================
Comment at: llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:357
+  const Value *GEPPointer;
+  /// Indexes that precede index that can be optimized.
+  SmallVector<const Value *> PreviousIndices;
----------------
This requires more explanation. I could not figure what are indices, which of them is being optimized, and what is precedence in this context. Maybe write a detailed comment on what's going on here and what does this structure represent?

================
Comment at: llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:434
+
+  /// Canonize GEP if needed and collect information to decide if GEP
+  /// modification is useful
----------------
Canonize -> Canonicalize

================
Comment at: llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:435
+  /// Canonize GEP if needed and collect information to decide if GEP
+  /// modification is useful
+  bool preprocessGEP(GetElementPtrInst *GEP);
----------------
I guess it should be "Returns true if a change was made, false otherwise".

================
Comment at: llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:531
+  /// GEP instructions chosen for transformation
+  DenseMap<GEPBaseInfo, SmallVector<GEPInfo>> InstructionsToTransform;
 };
----------------
Maybe rename `InstructionsToTransform` -> `GEPsToTransform`?

================
Comment at: llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:1065

-void
-SeparateConstOffsetFromGEP::lowerToArithmetics(GetElementPtrInst *Variadic,
-                                               int64_t AccumulativeByteOffset) {
+void SeparateConstOffsetFromGEP::lowerToArithmetics(
+    GetElementPtrInst *Variadic, int64_t AccumulativeByteOffset) {
----------------
Pls commit separately if it is needed.

================
Comment at: llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:1110

-bool SeparateConstOffsetFromGEP::splitGEP(GetElementPtrInst *GEP) {
+bool SeparateConstOffsetFromGEP::preprocessGEP(GetElementPtrInst *GEP) {
   // Skip vector GEPs.
----------------
Rename as separate NFC?

================
Comment at: llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:1356
+          for (const auto &GEPInfo : DetailedInfoList) {
+            LLVM_DEBUG(dbgs() << "Try to split GEP " << *GEPInfo.GEPInstruction
+                              << "\n");
----------------
To me, this code structure looks counter-intuitive. Why do we print "Try to split GEP "... only when we check profitability, and do it silently when we don't? If possible, please restructure it like

```
if (CheckProfitability) {
  // Do all required profitability checks
}
// Do common transform logic uniformly
```

I'm not sure if it's possible here because of this post-processing. If not, then the transform part should be unified somehow else.

================
Comment at: llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:1363
+            // other bases.
+            for (unsigned J = I + 1;
+                 J < SortedInstructionsList.size() && CurrentChanged; J++) {
----------------
More natural way would be
```
if (!CurrentChanged)
  continue;
for ...
```

================
Comment at: llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:1364
+            for (unsigned J = I + 1;
+                 J < SortedInstructionsList.size() && CurrentChanged; J++) {
+              auto RemoveIt = remove_if(SortedInstructionsList[J].first,
----------------
The complexity of this is `SortedInstructionsList.size() * SortedInstructionsList.size() * sum(SortedInstructionsList[J])` if I'm reading this correctly. Looks very expensive. Is there a cheaper way of doing this? Imagine you have 10k instructions on your list. It will just be stuck forever.

================
Comment at: llvm/test/Transforms/SeparateConstOffsetFromGEP/RISCV/split-gep.ll:86
+; CHECK-NEXT:    [[GEP8:%.*]] = getelementptr inbounds i32, i32* [[ARRAY]], i64 [[SEXT7]]
+; CHECK-NEXT:    store i32 [[ADD6]], i32* [[GEP8]], align 4
 ; CHECK-NEXT:    ret i32 undef
----------------
Why is it a better code tha the old one?

================
Comment at: llvm/test/Transforms/SeparateConstOffsetFromGEP/RISCV/split-gep.ll:286
+; CHECK-NEXT:    [[GEP8:%.*]] = getelementptr inbounds [50 x i32], [50 x i32]* [[ARRAY]], i64 [[SEXT4]], i64 [[SEXT7]]
+; CHECK-NEXT:    store i32 [[I]], i32* [[GEP8]], align 4
 ; CHECK-NEXT:    ret i32 undef
----------------
This code is bigger than it used to be. Can you explain why is it better?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D127727/new/

https://reviews.llvm.org/D127727