[PATCH] D41361: [SimplifyCFG] Avoid quadratic on a predecessors number behavior in instruction sinking.

Wed Dec 20 14:35:37 PST 2017

mzolotukhin added a comment.

> Yes, I'm not really worried about that aspect; I'm more concerned that the transform somehow isn't triggering in cases where it should.
> 
> I just tried digging a bit; it looks like the generated code is worse, particularly for avoid-cpsr-rmw.ll.  I'm not sure that's really the fault of this patch, exactly... it looks like the sinking does in fact happen, but the end result is different because of related transforms.  Looks like it's exposing a bad heuristic elsewhere?

Yes, it does behave differently after the patch. However, from code sinking point of view the new behavior seems better to me.

Here is how the behavior changed for `arm64-jumptable.ll`. Previously SimplifyCFG produced

  define void @sum(i32 %a, i32* %to, i32 %c) {
  entry:
    switch i32 %a, label %exit [
      i32 1, label %bb1
      i32 2, label %exit.sink.split
      i32 3, label %bb3
      i32 4, label %bb4
    ]
  bb1:
    %b = add i32 %c, 1
    br label %exit.sink.split
  bb3:
    br label %exit.sink.split
  bb4:
    br label %exit.sink.split
  exit.sink.split:
    %.sink = phi i32 [ 5, %bb4 ], [ %b, %bb1 ], [ 3, %bb3 ], [ %a, %entry ]
    store i32 %.sink, i32* %to
    br label %exit
  exit:
    ret void
  }

Now we're producing:

  define void @sum(i32 %a, i32* %to, i32 %c) {
  entry:
    switch i32 %a, label %exit [
      i32 1, label %bb1
      i32 2, label %exit.sink.split
      i32 3, label %exit.sink.split
      i32 4, label %bb4
    ]
  bb1:                                              ; preds = %entry
    %b = add i32 %c, 1
    br label %exit.sink.split
  bb4:                                              ; preds = %entry
    br label %exit.sink.split
  exit.sink.split:                                  ; preds = %entry, %entry, %bb1, %bb4
    %.sink = phi i32 [ 5, %bb4 ], [ %b, %bb1 ], [ %a, %entry ], [ %a, %entry ]
    store i32 %.sink, i32* %to
    br label %exit
  exit:                                             ; preds = %exit.sink.split, %entry
    ret void
  }

So, we merged edges `entry`->`bb3`->`exit.sink.split`. Except for helping backend, I don't see why we would want to keep these edges separate. Helping backend might be a sufficient reason, but I think it's generally better not to rely on this kind of stuff in the backend.

In `avoid-cpsr-rmw.ll` we previously generated:

  define void @t4(i32* nocapture %p, double* nocapture %q) #3 {
  entry:
    %0 = load double, double* %q, align 4
    %cmp = fcmp olt double %0, 1.000000e+01
    %incdec.ptr1 = getelementptr inbounds i32, i32* %p, i32 1
    br i1 %cmp, label %if.then, label %if.end
  if.then:                                          ; preds = %entry
    br label %if.end
  if.end:                                           ; preds = %entry, %if.then
    %.sink3 = phi i32 [ 7, %if.then ], [ 3, %entry ]
    %.sink2 = phi i32 [ 2, %if.then ], [ 3, %entry ]
    %.sink1 = phi i32 [ 8, %if.then ], [ 5, %entry ]
    %.sink = phi i32 [ 9, %if.then ], [ 6, %entry ]
    store i32 %.sink3, i32* %p, align 4
    %incdec.ptr5 = getelementptr inbounds i32, i32* %p, i32 %.sink2
    store i32 %.sink1, i32* %incdec.ptr1, align 4
    store i32 %.sink, i32* %incdec.ptr5, align 4
    ret void
  }

And now we generate:

  define void @t4(i32* nocapture %p, double* nocapture %q) #3 {
  entry:
    %0 = load double, double* %q, align 4
    %cmp = fcmp olt double %0, 1.000000e+01
    %incdec.ptr1 = getelementptr inbounds i32, i32* %p, i32 1
    %. = select i1 %cmp, i32 7, i32 3
    %.4 = select i1 %cmp, i32 2, i32 3
    %.5 = select i1 %cmp, i32 8, i32 5
    %.6 = select i1 %cmp, i32 9, i32 6
    store i32 %., i32* %p, align 4
    %incdec.ptr5 = getelementptr inbounds i32, i32* %p, i32 %.4
    store i32 %.5, i32* %incdec.ptr1, align 4
    store i32 %.6, i32* %incdec.ptr5, align 4
    ret void
  }

Since we're not changing logic for generating selects and we're ending up with them now, I assume that it's also a desirable form of IR in this case.

If that's not what we want to do in SimplifyCFG, I can investigate it further and try to fix.

Thanks,
Michael

https://reviews.llvm.org/D41361