[llvm] [LV] Pure runtime check for minimum profitable trip count. (PR #115833)

Mel Chen via llvm-commits llvm-commits at lists.llvm.org
Mon Nov 18 05:49:49 PST 2024


Mel-Chen wrote:

> Just for information I tried this patch out with the x264 benchmark on neoverse-v1 and it causes a ~2% performance regression. It looks like in the hot loop in `mc_chroma` we are not entering the tail-folded vector loop as often, and falling back on the scalar tail. I guess that means the min profitable trip count isn't quite right.

I got the min prof TC is 5 in function `mc_chroma`, did you too?
So far, we think there may be something need to fix when calculating min prof TC.
Maybe this
```
    // Second, compute a minimum iteration count so that the cost of the
    // runtime checks is only a fraction of the total scalar loop cost. This
    // adds a loop-dependent bound on the overhead incurred if the runtime
    // checks fail. In case the runtime checks fail, the cost is RtC + ScalarC
    // * TC. To bound the runtime check to be a fraction 1/X of the scalar
    // cost, compute
    //   RtC < ScalarC * TC * (1 / X)  ==>  RtC * X / ScalarC < TC
    uint64_t MinTC2 = divideCeil(RtC * 10, ScalarC);
```
Anyway, we will do more experiments and update the min prof TC if it is need.

https://github.com/llvm/llvm-project/pull/115833


More information about the llvm-commits mailing list