[PATCH] D125845: [InstCombine] Canonicalize GEP of GEP by swapping constant-indexed GEP to the back
Bjorn Pettersson via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Nov 28 17:18:45 PST 2022
bjope added a comment.
Got a benchmark downstream were we no longer get SLP vectorization after the canonicalizatoin. But the test is rather large and complicated so I do not have anything that show the full pipeline including SLP (at least not yet).
But before instcombine parts of the IR kind of looks like this:
%struct.S = type { i32, i32, i16, i16, [4 x i16] }
define i32 @foo(ptr %p1, ptr %p2) {
entry:
br label %for.cond
for.cond:
%i.0 = phi i16 [ 0, %entry ], [ %inc, %for.body ]
%cmp = icmp ult i16 %i.0, 4
br i1 %cmp, label %for.body, label %for.end
for.body:
%q1 = getelementptr inbounds %struct.S, ptr %p2, i32 0, i32 4
%q2 = getelementptr inbounds [4 x i16], ptr %q1, i32 0, i16 %i.0
%t1 = load i16, ptr %q2, align 1
store i16 %t1, ptr %p1
%inc = add i16 %i.0, 1
br label %for.cond
for.end:
ret i32 7
}
If just running `opt -S -passes=instcombine` on that we used to get
define i32 @foo(ptr %p1, ptr %p2) {
entry:
br label %for.cond
for.cond: ; preds = %for.body, %entry
%i.0 = phi i16 [ 0, %entry ], [ %inc, %for.body ]
%cmp = icmp ult i16 %i.0, 4
br i1 %cmp, label %for.body, label %for.end
for.body: ; preds = %for.cond
%0 = sext i16 %i.0 to i64
%q2 = getelementptr inbounds %struct.S, ptr %p2, i64 0, i32 4, i64 %0
%t1 = load i16, ptr %q2, align 1
store i16 %t1, ptr %p1, align 2
%inc = add i16 %i.0, 1
br label %for.cond
for.end: ; preds = %for.cond
ret i32 7
}
but after this patch we get two non-inbound GEP:s instead of one single GEP:
define i32 @foo(ptr %p1, ptr %p2) {
entry:
br label %for.cond
for.cond: ; preds = %for.body, %entry
%i.0 = phi i16 [ 0, %entry ], [ %inc, %for.body ]
%cmp = icmp ult i16 %i.0, 4
br i1 %cmp, label %for.body, label %for.end
for.body: ; preds = %for.cond
%0 = sext i16 %i.0 to i64
%q11 = getelementptr [4 x i16], ptr %p2, i64 0, i64 %0
%q2 = getelementptr %struct.S, ptr %q11, i64 0, i32 4
%t1 = load i16, ptr %q2, align 1
store i16 %t1, ptr %p1, align 2
%inc = add i16 %i.0, 1
br label %for.cond
for.end: ; preds = %for.cond
ret i32 7
}
So I suspect that the swapping (and replacing inbound GEP:s with non-inbound GEP:s) has prevented merging of the GEP:s?
Maybe the merging of the GEP:s isn't important here. But I think the result looks a bit weird anyhow. In the input IR the first GEP only got loop-invariant operands (so %q1 is loop-invariant in that sense even if LoopInfo::isInvariant would say that it is modified inside the loop) , but after the "swapping" the first GEP depend on %0 (or rather %i.0), so now it isn't loop-invariant.
Isn't the idea that it should try to put the loop invariant part first, to hopefully let LICM hoist that outside the loop. Although when I read the code comments it seems like we canonicalize in opposite order from what would suite LICM, so there is some kind of heuristic to avoid canonicalize in certain situations. So I'm not quite sure if this pattern really is supposed to be "swapped" or not. Or maybe even merged. Both merging and not swapping would allow keeping the inbounds. Not swapping could allow LICM to hoist one of the GEP:s out from the loop. But the current solution both prevent hoisting and it removes the inbound attributes. So I do not see how the swapping is beneficial in this case (unless we are canonicalizing for some specific target with an addressing mode that perhaps would allow folding of the constant offset in the load).
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D125845/new/
https://reviews.llvm.org/D125845
More information about the llvm-commits
mailing list