[llvm] [AMDGPU] Form V_MAD_U64_U32 from mul24/mulhi24 (PR #72393)

Wed Nov 15 23:07:04 PST 2023

Pierre-vh wrote:

> > CGP can transform a fine mul+add into a (mul24/mulhi24)+add, so add a pattern for that.
> 
> Typo "fine"? Not sure what you meant.
> 
> This would depend on the relative rate of mul_u24 vs mad_u64. On older ASICs, mad_u64 is "quarter rate" so two fast mul_u24 instructions should be faster. I see that gfx90a uses SIDPFullSpeedModel so mad_u64 is as fast as mul_u24.
> 
> In any case would it be better to teach CGP not to do the harmful transformation int he first place, rather than work around it in isel?

By fine I meant "fine to transform into v_mad" but I didn't know about FullSpeed/QuarterSpeed - I edited it.
Should I add a predicate on this to only do the transform on FullSpeedModels then?

About doing this in CGP, I asked @arsenm earlier and he suggested to fix it in ISel rather than teach CGP. I tend to agree - CGP doesn't always have the full picture/full knowledge of what the DAG can do, and it may be non-obvious to fix in CGP. Having a new pattern is simpler and more stable, IMO

https://github.com/llvm/llvm-project/pull/72393