[PATCH] D111523: [AMDGPU] Fix 24 bit mul intrinsic generation for > 32 bit result.

Tue Oct 19 11:45:24 PDT 2021

arsenm added a comment.

In D111523#3072078 <https://reviews.llvm.org/D111523#3072078>, @abinavpp wrote:

> In D111523#3070777 <https://reviews.llvm.org/D111523#3070777>, @arsenm wrote:
>
>> Do we not emit 24 bit mulhi in the IR pass? Could we just start emitting the full 48 bit computation?
>
> I couldn't find the 24-bit mulhi intrinsic in IntrinsicsAMDGPU.td. At the
> moment, AMDGPUTargetLowering::performMulCombine() of AMDGPUISelLowering.cpp is
> creating the 48-bit mul if this bails out.

You would have to add IR intrinsics for this. The only reason we do this in the IR is due to weakness in the known bits handling in the DAG since it can't see across blocks. We'll get some simple cases in the DAG now, but miss more complex cases.

> Ideally we should generate the 48-bit mul here like how getMul24() of
> AMDGPUISelLowering.cpp does. I'm not sure about the right approach for that.  If
> we create a 24-bit mulhi intrinsic, then store the 24-bit mul and 24-bit mulhi
> results in a { i16, i32 } struct, how do we replaceAllUsesWith() of the i64
> uses with { i16, i32 }. Otherwise, we can create a 48-bit mul intrinsic of
> type i64 (i32, i32), and split it during lowering. What do you think?

The result of the intrinsic should be 32-bit. The canonical way to merge 2 integer values in the IR would be to do an extend, shift and or

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111523/new/

https://reviews.llvm.org/D111523