[llvm-dev] Branch is not optimized because of right shift

Florian Hahn via llvm-dev llvm-dev at lists.llvm.org
Sun Apr 5 12:06:03 PDT 2020


Hi,

> On Apr 5, 2020, at 14:08, Stefanos Baziotis via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> To be more specific for everyone:
> - First of all, I hope it's clear that in the original (C) code, the region - 0x8 > 1000 branch should
> be eliminated. That is because it is inside a block that has ensured that 8 < region < 12. But for some reason,
> it is not eliminated. I'm trying to find that reason. :)
> - I understand that in the -O2 .ll output, we take %0 and do it a right shift. That means that we don't have
> any range info about %0 or %0 >>= 1, so we can't eliminate the branch. What I don't understand
> is why we chose to reuse %0 and do it a right shift and instead we didn't have a phi there.
> - Finally, I saw that putting nuw in the addition with -8 eliminates the branch. This makes sense since,
> we know probably know from the code above that %0 >>= 1 is less than 12. Although, I don't understand
> why the right shift was translated to an add with -16.
> 
> I hope this made some sense. You may ignore the last 2 and focus on the first, i.e. what optimization
> should have been done but it's not and what we can do about it.
> 
> A clearer version of the .ll: https://godbolt.org/z/2t4RU5

I think the IR in both of your examples makes things harder for the compiler than expected from the original C source.

With the IR in your original example (https://godbolt.org/z/BL-4jL), I think the problem is that the branch condition is '%0 - 16 < 12’, which allows us to limit the range of `%0 - 16` easily, but in the branch only %0 is used. Sinking the lshr too early made the analysis harder.

In the second example (https://godbolt.org/z/2t4RU5), things are hard to simplify because we need to use the information from the condition of one select to simplify the earlier select. 

The version in https://godbolt.org/z/_ipKhb  is probably the easiest for analysis (basically the original C source code built with `clang -O0 -S -emit-llvm`, followed by running `opt -mem2reg`). There’s a patch under review that adds support for conditional range propagation (https://reviews.llvm.org/D76611) and with the patch it can be simplified to the code below by running `opt  -ipsccp -simplifycfg -instcombine`

define i32 @test(i32 %0) {
  %.off = add i32 %0, -16
  %2 = icmp ult i32 %.off, 12
  %spec.select = select i1 %2, i32 66, i32 45
  ret i32 %spec.select
}


The reason it does not yet work in the default pipeline is that the branch condition will be combined to use an AND before the conditional propagation and D76611 does not support that yet. But once it lands that should be an easy extension.

The problems with your two versions could be addressed as well of course for that special case relatively easily I think, but the challenge is to fix it in a general and efficient (compile-time wise) way. I hope the conditional propagation should cover such cases soon though.

Cheers,
Florian


More information about the llvm-dev mailing list