[llvm] handle (X + A) % Op1 for small X, A (PR #140369)
Yingwei Zheng via llvm-commits
llvm-commits at lists.llvm.org
Sat May 17 05:38:42 PDT 2025
dtcxzyw wrote:
> `make check-llvm` does not show any regressions for me, do you know why it might be failing @nikic? Thanks for your time
The CI failure looks unrelated (flang build failure).
BTW, I think this patch should not be implemented in InstCombine. Generally, InstCombine canonicalizes patterns into a smaller one for better analysis results. BTW, this fold is not always profitable due to the following two reasons:
1. Div/Rem are faster than you expected. On some microarchitectures, the latency scales linearly with `Log2(Dividend) - Log2(Divisor)`. If the divisor is known at compile-time, the compiler may expand div/rem into mul+add sequences.
2. For some backends without CMOV instructions, the branch misprediction penalty may have a large impact on performance.
Therefore, I think you can implement this transformation in some target-dependent passes (e.g., CGP).
Additionally, for your motivating case, the branch misprediction penalty is negligible because `m` is an induction variable of a loop. We have implemented this optimization in https://github.com/llvm/llvm-project/pull/104724. You can generalize the PR if it is not enough to satisfy your demand :)
https://github.com/llvm/llvm-project/pull/140369
More information about the llvm-commits
mailing list