[PATCH] D106450: [InstCombine] Fold (gep (oneuse(gep Ptr, Idx0)), Idx1) -> (gep Ptr, (add Idx0, Idx1)) (PR51069)

Thu Jul 29 12:27:48 PDT 2021

lebedev.ri added a comment.

In D106450#2914386 <https://reviews.llvm.org/D106450#2914386>, @Whitney wrote:

> This change is causing performance degradation on one of our important benchmarks. Here is an example to show the potential reason:
>
>   @block = global i32 zeroinitializer, align 4
>   
>   define void @foo(i32* %j, i32 %k) {
>   entry:
>     br label %for.body
>   
>   for.body:
>     %i.01 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
>     %invariant0 = getelementptr i32, i32* @block, i32 %k
>     %x = getelementptr i32, i32* %invariant0, i32 %i.01
>     store i32 2, i32* %x, align 4
>     %invariant1 = getelementptr i32, i32* @block, i32 %k
>     %0 = load i32, i32* %j
>     %y = getelementptr i32, i32* %invariant1, i32 %0
>     store i32 1, i32* %y, align 4
>     %inc = add nsw i32 %i.01, 1
>     %cmp = icmp slt i32 %inc, 100
>     br i1 %cmp, label %for.body, label %for.end
>   
>   for.end:
>     ret void
>   }
>
> Without this change, %invariant0 and %invariant1 are CSE. And %invariant0 is LICM out of the loop.
>
>   bash-5.0$ opt t.ll -S -early-cse -licm
>   @block = global i32 0, align 4
>   
>   define void @foo(i32* %j, i32 %k) {
>   entry:
>     %invariant0 = getelementptr i32, i32* @block, i32 %k
>     br label %for.body
>   
>   for.body:                                         ; preds = %for.body, %entry
>     %i.01 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
>     %x = getelementptr i32, i32* %invariant0, i32 %i.01
>     store i32 2, i32* %x, align 4
>     %0 = load i32, i32* %j, align 4
>     %y = getelementptr i32, i32* %invariant0, i32 %0
>     store i32 1, i32* %y, align 4
>     %inc = add nsw i32 %i.01, 1
>     %cmp = icmp slt i32 %inc, 100
>     br i1 %cmp, label %for.body, label %for.end
>   
>   for.end:                                          ; preds = %for.body
>     ret void
>   }
>
> After this change, %invariant0 and %invariant1 are inst-combined and turn into add instructions. There is no longer invariance instruction to LICM or common subexpressions to eliminate.
>
>   bash-5.0$ opt t.ll -S -instcombine -early-cse -licm
>   @block = global i32 0, align 4
>   
>   define void @foo(i32* %j, i32 %k) {
>   entry:
>     %0 = sext i32 %k to i64
>     br label %for.body
>   
>   for.body:                                         ; preds = %for.body, %entry
>     %i.01 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
>     %1 = zext i32 %i.01 to i64
>     %x.idx = add nsw i64 %1, %0
>     %x = getelementptr i32, i32* @block, i64 %x.idx
>     store i32 2, i32* %x, align 4
>     %2 = load i32, i32* %j, align 4
>     %3 = sext i32 %2 to i64
>     %y.idx = add nsw i64 %3, %0
>     %y = getelementptr i32, i32* @block, i64 %y.idx
>     store i32 1, i32* %y, align 4
>     %inc = add nuw nsw i32 %i.01, 1
>     %cmp = icmp ult i32 %i.01, 99
>     br i1 %cmp, label %for.body, label %for.end
>   
>   for.end:                                          ; preds = %for.body
>     ret void
>   }
>
> The problem is more significant when the instructions are in a deeply nested loop, or the loop has a lot of iterations.
>
> Can we either revert this change or guard it under an option and disable by default, until there is a solution?

Generally, i do wonder if we are missing something like `SeparateConstOffsetFromGEPPass` from the pipelin.
But here, can you please provide an end-to-end test?
As far as i can see we always run EarlyCSE before InstCombine.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106450/new/

https://reviews.llvm.org/D106450