[llvm] [VPlan] Don't apply predication discount to non-originally-predicated blocks (PR #160449)

Wed Sep 24 00:46:48 PDT 2025

david-arm wrote:

I'm not sure if this statement is quite right:

`This is likely inaccurate because we can expect a tail folded instruction to be executed on every iteration bar the last.`

In one of the tests changed by this patch the loop has a low trip count of 7. Suppose we chose a VF of 16, then doesn't that mean that only 7 out of 16 scalar instructions are being executed? That takes us back to `getPredBlockCostDivisor` returning a value of 2 I think.

It feels like there are three different problems here:

1. Known low trip counts. Unless the trip count is an exact multiple of the VF, we know for sure that `getPredBlockCostDivisor` should return a number greater than 1 and in some cases may be even higher than 2 (TC=4/VF=16, etc.). I understand the intent of the patch is to not vectorise by making all the costs higher, but it feels counter-intuitive as the function isn't actually returning the value it's supposed to.
2. Known large trip counts. Here we absolutely know that `getPredBlockCostDivisor` should return 1.
3. Unknown trip counts. I think this is the "best guess" approach taken by your patch, where you are assuming the trip count is large. However, I know that in some benchmarks such as exchange2 there is a problem with low trip counts that are only known during the LTO optimisation phase. Pre-LTO we vectorise lots of loops in a hot code path without knowing the trip count, then during the LTO phase we discover the trip count is 3! However, I do appreciate that the real solution here is to delay vectorisation until the LTO stage.

I guess what I'm trying to say is that if the intent of the patch is to more accurately calculate the cost divisor based on the probability of entering the block, then paradoxically the cost should really reduce even further in some cases. That makes me wonder if this the right approach? Or perhaps the change to `getPredBlockCostDivisor` should be restricted only to cases where we know the trip count is much greater than the VF? For example, if we choose VF=2 then it's probably a good bet the cost divisor is going to be almost 1 since even for a trip count of 7 we know the ratio is 7/8.

https://github.com/llvm/llvm-project/pull/160449