[llvm] [AMDGPU] Form V_MAD_U64_U32 from mul24/mulhi24 (PR #72393)

Mon Nov 20 04:58:27 PST 2023

jayfoad wrote:

For background: my understanding is that the only reason CGP needs to help select 24-bit multiplies is that you can't do it during instruction selection because there is no way for a target to simplify a generic opcode like ISD::MUL based on demanded bits information. See https://discourse.llvm.org/t/selectiondag-target-specific-simplification-of-generic-nodes-using-demanded-bits/56747/1

With the current patch I still don't like the fact that CGP will split a 64-bit multiply and expect isel to combine it back into a single instruction later. How about this for a cleaner design:
- Change llvm.amdgcn.mul.i24 and llvm.amdgcn.mul.u24 to work for any scalar integer type, not just i32. So they are like a regular multiply but only look at the low 24 bits of their inputs. (Or the result is undefined if the inputs are not signed/unsigned 24-bit values - I'm not sure if that is a more useful definition?)
- Then CGP can convert mul to llvm.amdgcn.mul.i24/u24 where possible with no need to split up 64-bit multiplies.
- Isel patterns with suitable predicates based on the speed model can choose to implement a 64-bit i24/u24 multiply as either a pair of 24-bit multiplies, or a single 64-bit mad.

https://github.com/llvm/llvm-project/pull/72393