[llvm] [ARM] Prefer MUL to MULS on some implementations (PR #112540)

Wed Oct 23 06:03:57 PDT 2024

davemgreen wrote:

Hello. I believe that it is only the M33 where muls is slower than the mul. The Cortex-M52 for example lists them as having the same latency, and the T1 variable can potentially dual-issue (I believe), if it is in the right slot. If you do have other CPUs where this is helpful the feature could be added there too.

I'm not sure I fully understand the first sentence. I believe that for mul in IT blocks we will not attempt to do the shrinking from T2->T1, so we don't have to prevent it on CPUs where the muls is slower than mul.

https://github.com/llvm/llvm-project/pull/112540