[PATCH] D125845: [InstCombine] Canonicalize GEP of GEP by swapping constant-indexed GEP to the back

Mon Nov 28 17:18:45 PST 2022

bjope added a comment.

Got a benchmark downstream were we no longer get SLP vectorization after the canonicalizatoin. But the test is rather large and complicated so I do not have anything that show the full pipeline including SLP (at least not yet).

But before instcombine parts of the IR kind of looks like this:

  %struct.S = type { i32, i32, i16, i16, [4 x i16] }

  define i32 @foo(ptr %p1, ptr %p2) {
  entry:
    br label %for.cond

  for.cond:
    %i.0 = phi i16 [ 0, %entry ], [ %inc, %for.body ]
    %cmp = icmp ult i16 %i.0, 4
    br i1 %cmp, label %for.body, label %for.end

  for.body:
    %q1 = getelementptr inbounds %struct.S, ptr %p2, i32 0, i32 4
    %q2 = getelementptr inbounds [4 x i16], ptr %q1, i32 0, i16 %i.0
    %t1 = load i16, ptr %q2, align 1
    store i16 %t1, ptr %p1
    %inc = add i16 %i.0, 1
    br label %for.cond

  for.end:
    ret i32 7
  }

If just running `opt -S -passes=instcombine` on that we used to get

  define i32 @foo(ptr %p1, ptr %p2) {
  entry:
    br label %for.cond

  for.cond:                                         ; preds = %for.body, %entry
    %i.0 = phi i16 [ 0, %entry ], [ %inc, %for.body ]
    %cmp = icmp ult i16 %i.0, 4
    br i1 %cmp, label %for.body, label %for.end

  for.body:                                         ; preds = %for.cond
    %0 = sext i16 %i.0 to i64
    %q2 = getelementptr inbounds %struct.S, ptr %p2, i64 0, i32 4, i64 %0
    %t1 = load i16, ptr %q2, align 1
    store i16 %t1, ptr %p1, align 2
    %inc = add i16 %i.0, 1
    br label %for.cond

  for.end:                                          ; preds = %for.cond
    ret i32 7
  }

but after this patch we get two non-inbound GEP:s instead of one single GEP:

  define i32 @foo(ptr %p1, ptr %p2) {
  entry:
    br label %for.cond

  for.cond:                                         ; preds = %for.body, %entry
    %i.0 = phi i16 [ 0, %entry ], [ %inc, %for.body ]
    %cmp = icmp ult i16 %i.0, 4
    br i1 %cmp, label %for.body, label %for.end

  for.body:                                         ; preds = %for.cond
    %0 = sext i16 %i.0 to i64
    %q11 = getelementptr [4 x i16], ptr %p2, i64 0, i64 %0
    %q2 = getelementptr %struct.S, ptr %q11, i64 0, i32 4
    %t1 = load i16, ptr %q2, align 1
    store i16 %t1, ptr %p1, align 2
    %inc = add i16 %i.0, 1
    br label %for.cond

  for.end:                                          ; preds = %for.cond
    ret i32 7
  }

So I suspect that the swapping (and replacing inbound GEP:s with non-inbound GEP:s) has prevented merging of the GEP:s?

Maybe the merging of the GEP:s isn't important here. But I think the result looks a bit weird anyhow. In the input IR the first GEP only got loop-invariant operands (so %q1 is loop-invariant in that sense even if LoopInfo::isInvariant would say that it is modified inside the loop) , but after the "swapping" the first GEP depend on %0 (or rather %i.0), so now it isn't loop-invariant.
Isn't the idea that it should try to put the loop invariant part first, to hopefully let LICM hoist that outside the loop.  Although when I read the code comments it seems like we canonicalize in opposite order from what would suite LICM, so there is some kind of heuristic to avoid canonicalize in certain situations. So I'm not quite sure if this pattern really is supposed to be "swapped" or not. Or maybe even merged. Both merging and not swapping would allow keeping the inbounds. Not swapping could allow LICM to hoist one of the GEP:s out from the loop. But the current solution both prevent hoisting and it removes the inbound attributes. So I do not see how the swapping is beneficial in this case (unless we are canonicalizing for some specific target with an addressing mode that perhaps would allow folding of the constant offset in the load).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125845/new/

https://reviews.llvm.org/D125845