[llvm] [VPlan] Introduce multi-branch recipe, use for multi-exit loops (WIP). (PR #109193)

Mon Oct 14 12:59:27 PDT 2024

fhahn wrote:

> > It indeed might be simpler to start with this and later fold into the vector loop region, if needed (I think I did some small experiments a while ago, and the fused version in the loop latch was marginally faster for the config I tested)
> 
> I took a look at the other version, thanks for posting! I haven't tested this yet on a typical std::find loop, in particular the xalancbmk benchmark example, but I expect that for loops that take a long time to exit this is fine as the extra work of the compare and branch will be minimal. However, there are examples of loops that look for a mismatch (such as in xz or 7z):
> 
> ```
>   while (i++ != end)
>     if (a[i] != b[i])
>       break;
> ```
> 
> where the max number of iterations is small and the split middle block approach will be costly. We currently won't vectorise the mismatch loop in xz anyway because the LoopIdiomVectorize pass pre-empts LoopVectorize and should generate an efficient predicated vector loop. However, it's worth bearing in mind that early exit loops with low trip counts are also fairly common.
> 

Agreed there may be some cases where folding the dispatch to the exit block in the loop may be beneficial, but it may not be as crucial for the initial version as correctness and making sure the modeling fits together. One option would be to start with a simpler version dispatching outside the loop and then incrementally improve the codegen (e.g. folding the dispatch into the loop via `BranchOnMultipleConds`) once the main functionality landed.

> For what it's worth @fhahn I have a downstream version of #88385 working with the version posted here.

Sounds great!

https://github.com/llvm/llvm-project/pull/109193