[PATCH] D140208: [AMDGPU] Improved wide multiplies

Matt Arsenault via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Jan 11 16:38:08 PST 2023


arsenm added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:3007
             auto Mul = B.buildMul(S32, Src0[j0], Src1[j1]);
-            if (!LocalAccum[0]) {
+            if (!LocalAccum[0] || KB.getKnownBits(LocalAccum[0]).isZero()) {
               LocalAccum[0] = Mul.getReg(0);
----------------
OutOfCache wrote:
> This check is required, when the accumulator is a zero register.
> 
> `!LocalAccum[0]` only checks for the existence of a Register. It is still true, if the Register is known to be all zeroes.
> This particular case occurs when the lower bytes of an operand are masked. 
> In that case, the check in line 3048 will fail and no `G_MAD` will be created. `LocalAccum[0]` will still be set to the result of the Unmerge of the `Tmp` register in line 3060. `Tmp` is set to a zero register in line 3041, so it is all zeroes at this point.
> 
> By stepping through the debugger, I confirmed that in that case the first condition, `!LocalAccum[0]` will be false, but the second condition will be correctly evaluated to true and therefore skip the addition to 0.
If you're just looking for zero, just looking for the constant zero is cheaper than going through getKnownBits


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140208/new/

https://reviews.llvm.org/D140208



More information about the llvm-commits mailing list