[llvm] [AMDGPU] Form V_MAD_U64_U32 from mul24/mulhi24 (PR #72393)

Mon Nov 20 05:53:44 PST 2023

Pierre-vh wrote:

> For background: my understanding is that the only reason CGP needs to help select 24-bit multiplies is that you can't do it during instruction selection because there is no way for a target to simplify a generic opcode like ISD::MUL based on demanded bits information. See https://discourse.llvm.org/t/selectiondag-target-specific-simplification-of-generic-nodes-using-demanded-bits/56747/1
> 
> With the current patch I still don't like the fact that CGP will split a 64-bit multiply and expect isel to combine it back into a single instruction later. How about this for a cleaner design:
> 
> * Change llvm.amdgcn.mul.i24 and llvm.amdgcn.mul.u24 to work for any scalar integer type, not just i32. So they are like a regular multiply but only look at the low 24 bits of their inputs. (Or the result is undefined if the inputs are not signed/unsigned 24-bit values - I'm not sure if that is a more useful definition?)
> * Then CGP can convert mul to llvm.amdgcn.mul.i24/u24 where possible with no need to split up 64-bit multiplies.
> * Isel patterns with suitable predicates based on the speed model can choose to implement a 64-bit i24/u24 multiply as either a pair of 24-bit multiplies, or a single 64-bit mad.

Should the mulhi_24 intrinsics be gone or stay in that case?

If I understand correctly we'd only end up removing the second pattern, the first one would stay right?

Also I still don't understand the deal with the speed model. Should it affect ISel patterns really?
Or do we just do this:
 - V_MAD pattern can select a mul24 
 - mul24 not selected by any other pattern are lowered to a pair of mul24/mul24hi

https://github.com/llvm/llvm-project/pull/72393