[PATCH] D111523: [AMDGPU] Fix 24 bit mul intrinsic generation for > 32 bit result.
Abinav Puthan Purayil via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Oct 14 21:17:01 PDT 2021
abinavpp added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:515
+ // The 24-bit mul intrinsics yields the low-order 32 bits. The result's bit
+ // width should not exceed 32 if `Size` > 32.
+ if (Size > 32 &&
----------------
foad wrote:
> This comment just tells you what the "if" condition is, it doesn't really explain //why//.
Addressed in D111823.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:517
+ if (Size > 32 &&
+ numBitsUnsigned(LHS, Size) + numBitsUnsigned(RHS, Size) > 32) {
+ return false;
----------------
foad wrote:
> `isU24` already includes a call to numBitsUnsigned, so maybe remove the isU24 and isI24 wrappers altogether and just call the lower level function once for each argument:
> ```
> unsigned LHSBits = numBitsUnsigned(LHS, Size);
> unsigned RHSBits = numBitsUnsigned(RHS, Size):
> if (ST->hasMulU24() && LHSBits <= 24 && RHSBits <= 24 &&
> (Size <= 32 || LHSBits + RHSBits <= 32)) {
> IntrID = Intrinsic::amdgcn_mul_u24;
> }
> ```
> ... and similarly for the signed case.
Addressed in D111864.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:523
} else if (ST->hasMulI24() && isI24(LHS, Size) && isI24(RHS, Size)) {
+ if (Size > 32 && numBitsSigned(LHS, Size) + numBitsSigned(RHS, Size) > 31) {
+ return false;
----------------
foad wrote:
> I don't think `> 31` is safe here. For example it will transform this function:
> ```
> define amdgpu_ps i64 @test(i64 %x, i64 %y) {
> %xx = trunc i64 %x to i16
> %xxx = sext i16 %xx to i64
> %yy = trunc i64 %y to i17
> %yyy = sext i17 %yy to i64
> %mul = mul i64 %xxx, %yyy
> ret i64 %mul
> }
> ```
> into:
> ```
> define amdgpu_ps i64 @test(i64 %x, i64 %y) {
> %xx = trunc i64 %x to i16
> %xxx = sext i16 %xx to i64
> %yy = trunc i64 %y to i17
> %yyy = sext i17 %yy to i64
> %1 = trunc i64 %xxx to i32
> %2 = trunc i64 %yyy to i32
> %3 = call i32 @llvm.amdgcn.mul.i24(i32 %1, i32 %2)
> %mul = sext i32 %3 to i64
> ret i64 %mul
> }
> ```
> If %x is -0x8000 and %y is -0x10000 then the result should be +0x80000000, but the transformed code would return -0x80000000.
>
> I think this condition needs to be `> 30` instead.
Addressed in D111823.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D111523/new/
https://reviews.llvm.org/D111523
More information about the llvm-commits
mailing list