[PATCH] D111523: [AMDGPU] Fix 24 bit mul intrinsic generation for > 32 bit result.
Abinav Puthan Purayil via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sun Oct 24 20:26:04 PDT 2021
abinavpp added a comment.
In D111523#3073407 <https://reviews.llvm.org/D111523#3073407>, @arsenm wrote:
> In D111523#3072078 <https://reviews.llvm.org/D111523#3072078>, @abinavpp wrote:
>
>> In D111523#3070777 <https://reviews.llvm.org/D111523#3070777>, @arsenm wrote:
>>
>>> Do we not emit 24 bit mulhi in the IR pass? Could we just start emitting the full 48 bit computation?
>>
>> I couldn't find the 24-bit mulhi intrinsic in IntrinsicsAMDGPU.td. At the
>> moment, AMDGPUTargetLowering::performMulCombine() of AMDGPUISelLowering.cpp is
>> creating the 48-bit mul if this bails out.
>
> You would have to add IR intrinsics for this. The only reason we do this in the IR is due to weakness in the known bits handling in the DAG since it can't see across blocks. We'll get some simple cases in the DAG now, but miss more complex cases.
>
>> Ideally we should generate the 48-bit mul here like how getMul24() of
>> AMDGPUISelLowering.cpp does. I'm not sure about the right approach for that. If
>> we create a 24-bit mulhi intrinsic, then store the 24-bit mul and 24-bit mulhi
>> results in a { i16, i32 } struct, how do we replaceAllUsesWith() of the i64
>> uses with { i16, i32 }. Otherwise, we can create a 48-bit mul intrinsic of
>> type i64 (i32, i32), and split it during lowering. What do you think?
>
> The result of the intrinsic should be 32-bit. The canonical way to merge 2 integer values in the IR would be to do an extend, shift and or
Addressed in D112394 <https://reviews.llvm.org/D112394> and D112395 <https://reviews.llvm.org/D112395>.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D111523/new/
https://reviews.llvm.org/D111523
More information about the llvm-commits
mailing list