mgudim wrote: @wangpc-pp Thanks for the test case, I'll look into that. I have only tested on x264 train workload and I got 1% impovement in cycle count. Can you share your results? https://github.com/llvm/llvm-project/pull/92959