[PATCH] D127727: [SeparateConstOffsetFromGEPPass] Added optional modification strategy

Wed Aug 10 08:02:32 PDT 2022

eklepilkina marked 8 inline comments as done.
eklepilkina added a comment.

> Can I see any benefit from this patch in X86 for example?

This pass was written for targets with limited addressing mode, so it isn't added to X86 pipeline. It's used under the flag on Aarch64 and on RISCV we also suggest to turn off it by default, but this patch helps to make these optimization be useful more often, and remove some regressions that was found if turn these pass on on all test-suite. I'll provide test-suite results on Aarch64 platform with turned this pass with and without this patch. But yes, the main measurements were made for RISCV.

And if you mean that these changes should be done later in pipeline, there is the problem with the current instruction selection that can work only with one BB, so CodeGenPrepare pass need to sunk such GEPs with const to generate adddressing by offset, so I believe this pass was created as middle-end part.

================
Comment at: llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:1110

-bool SeparateConstOffsetFromGEP::splitGEP(GetElementPtrInst *GEP) {
+bool SeparateConstOffsetFromGEP::preprocessGEP(GetElementPtrInst *GEP) {
   // Skip vector GEPs.
----------------
mkazantsev wrote:
> Rename as separate NFC?
I don't really like the idea to rename in separate NFC patch, because renaming is connected with changes that were made and the old name wasn't suitable any more

================
Comment at: llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:1356
+          for (const auto &GEPInfo : DetailedInfoList) {
+            LLVM_DEBUG(dbgs() << "Try to split GEP " << *GEPInfo.GEPInstruction
+                              << "\n");
----------------
mkazantsev wrote:
> To me, this code structure looks counter-intuitive. Why do we print "Try to split GEP "... only when we check profitability, and do it silently when we don't? If possible, please restructure it like
> 
> ```
> if (CheckProfitability) {
>   // Do all required profitability checks
> }
> // Do common transform logic uniformly
> ```
> 
> I'm not sure if it's possible here because of this post-processing. If not, then the transform part should be unified somehow else.
I understand your concerns, but I don't see a good solution here, because I don't want to make the unneeded actions for original version without checking profitability.

================
Comment at: llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp:1364
+            for (unsigned J = I + 1;
+                 J < SortedInstructionsList.size() && CurrentChanged; J++) {
+              auto RemoveIt = remove_if(SortedInstructionsList[J].first,
----------------
mkazantsev wrote:
> The complexity of this is `SortedInstructionsList.size() * SortedInstructionsList.size() * sum(SortedInstructionsList[J])` if I'm reading this correctly. Looks very expensive. Is there a cheaper way of doing this? Imagine you have 10k instructions on your list. It will just be stuck forever.
> Imagine you have 10k instructions on your list

I amn't sure we should optimize this case, because it's mostly impossible, because this list is always quite small. I'll think some more, but I amn't sure that the optimization here is more important than readability.

================
Comment at: llvm/test/Transforms/SeparateConstOffsetFromGEP/RISCV/split-gep.ll:86
+; CHECK-NEXT:    [[GEP8:%.*]] = getelementptr inbounds i32, i32* [[ARRAY]], i64 [[SEXT7]]
+; CHECK-NEXT:    store i32 [[ADD6]], i32* [[GEP8]], align 4
 ; CHECK-NEXT:    ret i32 undef
----------------
mkazantsev wrote:
> Why is it a better code tha the old one?
In assembly we use one more register to save the result of new generated GEP instruction, bt we have no profit because registers that are used by adds are also needed as far as these values are used in other instructions.

================
Comment at: llvm/test/Transforms/SeparateConstOffsetFromGEP/RISCV/split-gep.ll:286
+; CHECK-NEXT:    [[GEP8:%.*]] = getelementptr inbounds [50 x i32], [50 x i32]* [[ARRAY]], i64 [[SEXT4]], i64 [[SEXT7]]
+; CHECK-NEXT:    store i32 [[I]], i32* [[GEP8]], align 4
 ; CHECK-NEXT:    ret i32 undef
----------------
mkazantsev wrote:
> This code is bigger than it used to be. Can you explain why is it better?
This code is bigger on IR, and it's so becuase of repeating sext opertaions, but `sext` isn't so critical in assembly, at the same time pass generates 2 new GEP instructions that are used as base and we need registers for them

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D127727/new/

https://reviews.llvm.org/D127727