[llvm] handle (X + A) % Op1 for small X, A (PR #140369)

Sat May 17 05:38:42 PDT 2025

dtcxzyw wrote:

> `make check-llvm` does not show any regressions for me, do you know why it might be failing @nikic? Thanks for your time

The CI failure looks unrelated (flang build failure).

BTW, I think this patch should not be implemented in InstCombine. Generally, InstCombine canonicalizes patterns into a smaller one for better analysis results. BTW, this fold is not always profitable due to the following two reasons:
1. Div/Rem are faster than you expected. On some microarchitectures, the latency scales linearly with `Log2(Dividend) - Log2(Divisor)`. If the divisor is known at compile-time, the compiler may expand div/rem into mul+add sequences.
2. For some backends without CMOV instructions, the branch misprediction penalty may have a large impact on performance.

Therefore, I think you can implement this transformation in some target-dependent passes (e.g., CGP).
Additionally, for your motivating case, the branch misprediction penalty is negligible because `m` is an induction variable of a loop. We have implemented this optimization in https://github.com/llvm/llvm-project/pull/104724. You can generalize the PR if it is not enough to satisfy your demand :)

https://github.com/llvm/llvm-project/pull/140369