[PATCH] D111523: [AMDGPU] Fix 24 bit mul intrinsic generation for > 32 bit result.

Thu Oct 14 06:39:17 PDT 2021

foad added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:454
   // be a sign bit.
   return ScalarSize - ComputeNumSignBits(Op, *DL, 0, AC);
 }
----------------
I think it would make more sense for this to return `ScalarSize - ComputeNumSignBits(Op, *DL, 0, AC) - 1`. This is the maximum size that Op could be truncated to, and sign extended to give you the original value. This is analogous to numBitsUnsigned, which tells you the maximum size you could truncate to and then //zero// extend to get the original value.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:515
+    // The 24-bit mul intrinsics yields the low-order 32 bits. The result's bit
+    // width should not exceed 32 if `Size` > 32.
+    if (Size > 32 &&
----------------
This comment just tells you what the "if" condition is, it doesn't really explain //why//.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:517
+    if (Size > 32 &&
+        numBitsUnsigned(LHS, Size) + numBitsUnsigned(RHS, Size) > 32) {
+      return false;
----------------
`isU24` already includes a call to numBitsUnsigned, so maybe remove the isU24 and isI24 wrappers altogether and just call the lower level function once for each argument:
```
unsigned LHSBits = numBitsUnsigned(LHS, Size);
unsigned RHSBits = numBitsUnsigned(RHS, Size):
if (ST->hasMulU24() && LHSBits <= 24 && RHSBits <= 24 &&
    (Size <= 32 || LHSBits + RHSBits <= 32)) {
  IntrID = Intrinsic::amdgcn_mul_u24;
}
```
... and similarly for the signed case.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:523
   } else if (ST->hasMulI24() && isI24(LHS, Size) && isI24(RHS, Size)) {
+    if (Size > 32 && numBitsSigned(LHS, Size) + numBitsSigned(RHS, Size) > 31) {
+      return false;
----------------
I don't think `> 31` is safe here. For example it will transform this function:
```
define amdgpu_ps i64 @test(i64 %x, i64 %y) {
  %xx = trunc i64 %x to i16
  %xxx = sext i16 %xx to i64
  %yy = trunc i64 %y to i17
  %yyy = sext i17 %yy to i64
  %mul = mul i64 %xxx, %yyy
  ret i64 %mul
}
```
into:
```
define amdgpu_ps i64 @test(i64 %x, i64 %y) {
  %xx = trunc i64 %x to i16
  %xxx = sext i16 %xx to i64
  %yy = trunc i64 %y to i17
  %yyy = sext i17 %yy to i64
  %1 = trunc i64 %xxx to i32
  %2 = trunc i64 %yyy to i32
  %3 = call i32 @llvm.amdgcn.mul.i24(i32 %1, i32 %2)
  %mul = sext i32 %3 to i64
  ret i64 %mul
}
```
If %x is -0x8000 and %y is -0x10000 then the result should be +0x80000000, but the transformed code would return -0x80000000.

I think this condition needs to be `> 30` instead.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111523/new/

https://reviews.llvm.org/D111523