[PATCH] D111523: [AMDGPU] Fix 24 bit mul intrinsic generation for > 32 bit result.

Mon Oct 18 23:47:42 PDT 2021

abinavpp added a comment.

In D111523#3070777 <https://reviews.llvm.org/D111523#3070777>, @arsenm wrote:

> Do we not emit 24 bit mulhi in the IR pass? Could we just start emitting the full 48 bit computation?

I couldn't find the 24-bit mulhi intrinsic in IntrinsicsAMDGPU.td. At the
moment, AMDGPUTargetLowering::performMulCombine() of AMDGPUISelLowering.cpp is
creating the 48-bit mul if this bails out.

Ideally we should generate the 48-bit mul here like how getMul24() of
AMDGPUISelLowering.cpp does. I'm not sure about the right approach for that.  If
we create a 24-bit mulhi intrinsic, then store the 24-bit mul and 24-bit mulhi
results in a { i16, i32 } struct, how do we replaceAllUsesWith() of the i64
uses with { i16, i32 }. Otherwise, we can create a 48-bit mulhi intrinsic of
type i64 (i32, i32), and split it during lowering. What do you think?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111523/new/

https://reviews.llvm.org/D111523