[llvm] [AMDGPU] Form V_MAD_U64_U32 from mul24 (PR #72393)

Tue Nov 21 13:15:33 PST 2023

jayfoad wrote:

> However I still haven't done the SpeedModel part because:
> 
> * (Apologies but) I'm still confused. Do we need it on those new patterns (we only form the V_MAD_I64_I32 if on FullSpeed) or in [[AMDGPU] Don't create mulhi_24 in CGP #72983](https://github.com/llvm/llvm-project/pull/72983) to emit a i64 mul instead of i24 muls on non-FullSpeed models?

The new patterns should probably have a FullRate64Ops predicate, and _if_ you want to use V_MAD_I64_I32 for a plain 32 * 32 -> 64 bit multiply (not a multiply-add), as suggested inline, then the old pattern could also have a NotFullRate64Ops predicate. It all depends on exactly which pattern you want to be used for each speed model, and making sure there is always at least one pattern that will match each DAG node so you don't get selection failures.

Also, as an alternative to putting predicates on patterns, you could try fiddling with AddedComplexity (since that's the closest thing we have to a cost model for pattern-based selection). But it probably doesn't make much difference in practice.

> * We don't have patterns like that currently so I'm wondering if there is a good reason behind it? (cc @arsenm)

I guess there aren't many cases where we want to select different instructions to do the same thing for different speed models.

https://github.com/llvm/llvm-project/pull/72393