[llvm] [AArch64][Codegen]Transform saturating smull to sqdmulh (PR #143671)

Sun Jun 22 14:51:57 PDT 2025

davemgreen wrote:

Hi - sorry for the delay, I was trying to re-remember how this instruction worked. This has three functions: https://godbolt.org/z/cxb7osTP1, the first of which I think is the most basic for of sqdmulh (notice the >2x bitwidth extend to allow the mul and the x2 to not wrap, and the min+max to saturate). That is equivalent to the @updated which is what llvm will optimize it to (the mul x2 is folded into the shift, and the only value that can actually saturate is -0x8000*-0x8000). It is equivalent the the third I believe because we only need 2x the bitwidth in this form.

That feels like the most basic form of sqdmulh. Any reason not to add that one first? I didn't look into this pattern a huge amount, but do you know if the bottom or top bits require the shifts? Or both?

https://github.com/llvm/llvm-project/pull/143671