[PATCH] D111523: [AMDGPU] Fix 24 bit mul intrinsic generation for > 32 bit result.

Thu Oct 14 21:17:01 PDT 2021

abinavpp added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:515
+    // The 24-bit mul intrinsics yields the low-order 32 bits. The result's bit
+    // width should not exceed 32 if `Size` > 32.
+    if (Size > 32 &&
----------------
foad wrote:
> This comment just tells you what the "if" condition is, it doesn't really explain //why//.
Addressed in D111823.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:517
+    if (Size > 32 &&
+        numBitsUnsigned(LHS, Size) + numBitsUnsigned(RHS, Size) > 32) {
+      return false;
----------------
foad wrote:
> `isU24` already includes a call to numBitsUnsigned, so maybe remove the isU24 and isI24 wrappers altogether and just call the lower level function once for each argument:
> ```
> unsigned LHSBits = numBitsUnsigned(LHS, Size);
> unsigned RHSBits = numBitsUnsigned(RHS, Size):
> if (ST->hasMulU24() && LHSBits <= 24 && RHSBits <= 24 &&
>     (Size <= 32 || LHSBits + RHSBits <= 32)) {
>   IntrID = Intrinsic::amdgcn_mul_u24;
> }
> ```
> ... and similarly for the signed case.
Addressed in D111864.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:523
   } else if (ST->hasMulI24() && isI24(LHS, Size) && isI24(RHS, Size)) {
+    if (Size > 32 && numBitsSigned(LHS, Size) + numBitsSigned(RHS, Size) > 31) {
+      return false;
----------------
foad wrote:
> I don't think `> 31` is safe here. For example it will transform this function:
> ```
> define amdgpu_ps i64 @test(i64 %x, i64 %y) {
>   %xx = trunc i64 %x to i16
>   %xxx = sext i16 %xx to i64
>   %yy = trunc i64 %y to i17
>   %yyy = sext i17 %yy to i64
>   %mul = mul i64 %xxx, %yyy
>   ret i64 %mul
> }
> ```
> into:
> ```
> define amdgpu_ps i64 @test(i64 %x, i64 %y) {
>   %xx = trunc i64 %x to i16
>   %xxx = sext i16 %xx to i64
>   %yy = trunc i64 %y to i17
>   %yyy = sext i17 %yy to i64
>   %1 = trunc i64 %xxx to i32
>   %2 = trunc i64 %yyy to i32
>   %3 = call i32 @llvm.amdgcn.mul.i24(i32 %1, i32 %2)
>   %mul = sext i32 %3 to i64
>   ret i64 %mul
> }
> ```
> If %x is -0x8000 and %y is -0x10000 then the result should be +0x80000000, but the transformed code would return -0x80000000.
> 
> I think this condition needs to be `> 30` instead.
Addressed in D111823.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111523/new/

https://reviews.llvm.org/D111523