[llvm] [LV] Pure runtime check for minimum profitable trip count. (PR #115833)
Mel Chen via llvm-commits
llvm-commits at lists.llvm.org
Mon Nov 18 05:49:49 PST 2024
Mel-Chen wrote:
> Just for information I tried this patch out with the x264 benchmark on neoverse-v1 and it causes a ~2% performance regression. It looks like in the hot loop in `mc_chroma` we are not entering the tail-folded vector loop as often, and falling back on the scalar tail. I guess that means the min profitable trip count isn't quite right.
I got the min prof TC is 5 in function `mc_chroma`, did you too?
So far, we think there may be something need to fix when calculating min prof TC.
Maybe this
```
// Second, compute a minimum iteration count so that the cost of the
// runtime checks is only a fraction of the total scalar loop cost. This
// adds a loop-dependent bound on the overhead incurred if the runtime
// checks fail. In case the runtime checks fail, the cost is RtC + ScalarC
// * TC. To bound the runtime check to be a fraction 1/X of the scalar
// cost, compute
// RtC < ScalarC * TC * (1 / X) ==> RtC * X / ScalarC < TC
uint64_t MinTC2 = divideCeil(RtC * 10, ScalarC);
```
Anyway, we will do more experiments and update the min prof TC if it is need.
https://github.com/llvm/llvm-project/pull/115833
More information about the llvm-commits
mailing list