[llvm] [clang] [SimplifyCFG] Not folding branch in loop header with constant iterations (PR #74268)

Mon Dec 4 22:28:16 PST 2023

xiangzh1 wrote:

> So where is the different X86 can partial unroll but AMDGPU can not unroll at all?
> https://godbolt.org/z/cMeE61bhf Loop unroll with -unroll-runtime can partial unroll the case. @nikic It looks if we don't avoid the transform, it will become a runtime unroll. The case before simplifycfg is https://godbolt.org/z/5MoYM8rGn. @xiangzh1 's solution looks fine to me if we do not involve loopInfo in simplifycfg. And we still need a mininal IR test for it.

1 In fact, I didn't much care about the different unroll between different targets. The loop unroll pass consider the TTI port, it is make sense to me "one do partial unroll or not" or "partial unroll with different unroll count".
I more care about the Known loop count for unroll become Unkown. This do big change for unroll (even successful). For example, loop with small Known loop count can usually be fully unrolled, which usually much simplify the address (offset) calculations in old iterations (then we can do a lot of others optimizations, e.g, SROA, for these simplifed calculations). But these don't work for Unkown loop count.

2 I am creating the mininal IR test. (I'll replace current .cu test with it, duo to I use -O2 in current test)

thanks again!

https://github.com/llvm/llvm-project/pull/74268