[PATCH] D106450: [InstCombine] Fold (gep (oneuse(gep Ptr, Idx0)), Idx1) -> (gep Ptr, (add Idx0, Idx1)) (PR51069)
Roman Lebedev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Jul 29 12:27:48 PDT 2021
lebedev.ri added a comment.
In D106450#2914386 <https://reviews.llvm.org/D106450#2914386>, @Whitney wrote:
> This change is causing performance degradation on one of our important benchmarks. Here is an example to show the potential reason:
>
> @block = global i32 zeroinitializer, align 4
>
> define void @foo(i32* %j, i32 %k) {
> entry:
> br label %for.body
>
> for.body:
> %i.01 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
> %invariant0 = getelementptr i32, i32* @block, i32 %k
> %x = getelementptr i32, i32* %invariant0, i32 %i.01
> store i32 2, i32* %x, align 4
> %invariant1 = getelementptr i32, i32* @block, i32 %k
> %0 = load i32, i32* %j
> %y = getelementptr i32, i32* %invariant1, i32 %0
> store i32 1, i32* %y, align 4
> %inc = add nsw i32 %i.01, 1
> %cmp = icmp slt i32 %inc, 100
> br i1 %cmp, label %for.body, label %for.end
>
> for.end:
> ret void
> }
>
> Without this change, %invariant0 and %invariant1 are CSE. And %invariant0 is LICM out of the loop.
>
> bash-5.0$ opt t.ll -S -early-cse -licm
> @block = global i32 0, align 4
>
> define void @foo(i32* %j, i32 %k) {
> entry:
> %invariant0 = getelementptr i32, i32* @block, i32 %k
> br label %for.body
>
> for.body: ; preds = %for.body, %entry
> %i.01 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
> %x = getelementptr i32, i32* %invariant0, i32 %i.01
> store i32 2, i32* %x, align 4
> %0 = load i32, i32* %j, align 4
> %y = getelementptr i32, i32* %invariant0, i32 %0
> store i32 1, i32* %y, align 4
> %inc = add nsw i32 %i.01, 1
> %cmp = icmp slt i32 %inc, 100
> br i1 %cmp, label %for.body, label %for.end
>
> for.end: ; preds = %for.body
> ret void
> }
>
> After this change, %invariant0 and %invariant1 are inst-combined and turn into add instructions. There is no longer invariance instruction to LICM or common subexpressions to eliminate.
>
> bash-5.0$ opt t.ll -S -instcombine -early-cse -licm
> @block = global i32 0, align 4
>
> define void @foo(i32* %j, i32 %k) {
> entry:
> %0 = sext i32 %k to i64
> br label %for.body
>
> for.body: ; preds = %for.body, %entry
> %i.01 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
> %1 = zext i32 %i.01 to i64
> %x.idx = add nsw i64 %1, %0
> %x = getelementptr i32, i32* @block, i64 %x.idx
> store i32 2, i32* %x, align 4
> %2 = load i32, i32* %j, align 4
> %3 = sext i32 %2 to i64
> %y.idx = add nsw i64 %3, %0
> %y = getelementptr i32, i32* @block, i64 %y.idx
> store i32 1, i32* %y, align 4
> %inc = add nuw nsw i32 %i.01, 1
> %cmp = icmp ult i32 %i.01, 99
> br i1 %cmp, label %for.body, label %for.end
>
> for.end: ; preds = %for.body
> ret void
> }
>
> The problem is more significant when the instructions are in a deeply nested loop, or the loop has a lot of iterations.
>
> Can we either revert this change or guard it under an option and disable by default, until there is a solution?
Generally, i do wonder if we are missing something like `SeparateConstOffsetFromGEPPass` from the pipelin.
But here, can you please provide an end-to-end test?
As far as i can see we always run EarlyCSE before InstCombine.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D106450/new/
https://reviews.llvm.org/D106450
More information about the llvm-commits
mailing list