[llvm] [AArch64][Codegen]Transform saturating smull to sqdmulh (PR #143671)

Thu Jun 26 03:49:40 PDT 2025

nasherm wrote:

> Hi - sorry for the delay, I was trying to re-remember how this instruction worked. This has three functions: https://godbolt.org/z/cxb7osTP1, the first of which I think is the most basic for of sqdmulh (notice the >2x bitwidth extend to allow the mul and the x2 to not wrap, and the min+max to saturate). That is equivalent to the @Updated which is what llvm will optimize it to (the mul x2 is folded into the shift, and the only value that can actually saturate is -0x8000*-0x8000). It is equivalent the the third I believe because we only need 2x the bitwidth in this form.
> 
> That feels like the most basic form of sqdmulh. Any reason not to add that one first? I didn't look into this pattern a huge amount, but do you know if the bottom or top bits require the shifts? Or both?

The bottom and top bits don't require shifts. This was a mistake on my part with my test.c program. My most recent patch uses your godbolt example to find the pattern. It makes sense to me and feels pretty generic such that SVE support would be feasible with some tinkering. Do let me know what you think

https://github.com/llvm/llvm-project/pull/143671