[PATCH] D41361: [SimplifyCFG] Avoid quadratic on a predecessors number behavior in instruction sinking.
Michael Zolotukhin via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Dec 20 14:35:37 PST 2017
mzolotukhin added a comment.
> Yes, I'm not really worried about that aspect; I'm more concerned that the transform somehow isn't triggering in cases where it should.
>
> I just tried digging a bit; it looks like the generated code is worse, particularly for avoid-cpsr-rmw.ll. I'm not sure that's really the fault of this patch, exactly... it looks like the sinking does in fact happen, but the end result is different because of related transforms. Looks like it's exposing a bad heuristic elsewhere?
Yes, it does behave differently after the patch. However, from code sinking point of view the new behavior seems better to me.
Here is how the behavior changed for `arm64-jumptable.ll`. Previously SimplifyCFG produced
define void @sum(i32 %a, i32* %to, i32 %c) {
entry:
switch i32 %a, label %exit [
i32 1, label %bb1
i32 2, label %exit.sink.split
i32 3, label %bb3
i32 4, label %bb4
]
bb1:
%b = add i32 %c, 1
br label %exit.sink.split
bb3:
br label %exit.sink.split
bb4:
br label %exit.sink.split
exit.sink.split:
%.sink = phi i32 [ 5, %bb4 ], [ %b, %bb1 ], [ 3, %bb3 ], [ %a, %entry ]
store i32 %.sink, i32* %to
br label %exit
exit:
ret void
}
Now we're producing:
define void @sum(i32 %a, i32* %to, i32 %c) {
entry:
switch i32 %a, label %exit [
i32 1, label %bb1
i32 2, label %exit.sink.split
i32 3, label %exit.sink.split
i32 4, label %bb4
]
bb1: ; preds = %entry
%b = add i32 %c, 1
br label %exit.sink.split
bb4: ; preds = %entry
br label %exit.sink.split
exit.sink.split: ; preds = %entry, %entry, %bb1, %bb4
%.sink = phi i32 [ 5, %bb4 ], [ %b, %bb1 ], [ %a, %entry ], [ %a, %entry ]
store i32 %.sink, i32* %to
br label %exit
exit: ; preds = %exit.sink.split, %entry
ret void
}
So, we merged edges `entry`->`bb3`->`exit.sink.split`. Except for helping backend, I don't see why we would want to keep these edges separate. Helping backend might be a sufficient reason, but I think it's generally better not to rely on this kind of stuff in the backend.
In `avoid-cpsr-rmw.ll` we previously generated:
define void @t4(i32* nocapture %p, double* nocapture %q) #3 {
entry:
%0 = load double, double* %q, align 4
%cmp = fcmp olt double %0, 1.000000e+01
%incdec.ptr1 = getelementptr inbounds i32, i32* %p, i32 1
br i1 %cmp, label %if.then, label %if.end
if.then: ; preds = %entry
br label %if.end
if.end: ; preds = %entry, %if.then
%.sink3 = phi i32 [ 7, %if.then ], [ 3, %entry ]
%.sink2 = phi i32 [ 2, %if.then ], [ 3, %entry ]
%.sink1 = phi i32 [ 8, %if.then ], [ 5, %entry ]
%.sink = phi i32 [ 9, %if.then ], [ 6, %entry ]
store i32 %.sink3, i32* %p, align 4
%incdec.ptr5 = getelementptr inbounds i32, i32* %p, i32 %.sink2
store i32 %.sink1, i32* %incdec.ptr1, align 4
store i32 %.sink, i32* %incdec.ptr5, align 4
ret void
}
And now we generate:
define void @t4(i32* nocapture %p, double* nocapture %q) #3 {
entry:
%0 = load double, double* %q, align 4
%cmp = fcmp olt double %0, 1.000000e+01
%incdec.ptr1 = getelementptr inbounds i32, i32* %p, i32 1
%. = select i1 %cmp, i32 7, i32 3
%.4 = select i1 %cmp, i32 2, i32 3
%.5 = select i1 %cmp, i32 8, i32 5
%.6 = select i1 %cmp, i32 9, i32 6
store i32 %., i32* %p, align 4
%incdec.ptr5 = getelementptr inbounds i32, i32* %p, i32 %.4
store i32 %.5, i32* %incdec.ptr1, align 4
store i32 %.6, i32* %incdec.ptr5, align 4
ret void
}
Since we're not changing logic for generating selects and we're ending up with them now, I assume that it's also a desirable form of IR in this case.
If that's not what we want to do in SimplifyCFG, I can investigate it further and try to fix.
Thanks,
Michael
https://reviews.llvm.org/D41361
More information about the llvm-commits
mailing list