[llvm] [AMDGPU] Form V_MAD_U64_U32 from mul24/mulhi24 (PR #72393)

Wed Nov 15 06:49:59 PST 2023

jayfoad wrote:

> CGP can transform a fine mul+add into a (mul24/mulhi24)+add, so add a pattern for that.

Typo "fine"? Not sure what you meant.

This would depend on the relative rate of mul_u24 vs mad_u64. On older ASICs, mad_u64 is "quarter rate" so two fast mul_u24 instructions should be faster. I see that gfx90a uses SIDPFullSpeedModel so mad_u64 is as fast as mul_u24.

In any case would it be better to teach CGP not to do the harmful transformation int he first place, rather than work around it in isel?

https://github.com/llvm/llvm-project/pull/72393