[PATCH] D140208: [AMDGPU] Improved wide multiplies

Thomas Symalla via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jan 17 03:00:07 PST 2023


tsymalla added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:3007
             auto Mul = B.buildMul(S32, Src0[j0], Src1[j1]);
-            if (!LocalAccum[0]) {
+            if (!LocalAccum[0] || KB.getKnownBits(LocalAccum[0]).isZero()) {
               LocalAccum[0] = Mul.getReg(0);
----------------
OutOfCache wrote:
> arsenm wrote:
> > OutOfCache wrote:
> > > This check is required, when the accumulator is a zero register.
> > > 
> > > `!LocalAccum[0]` only checks for the existence of a Register. It is still true, if the Register is known to be all zeroes.
> > > This particular case occurs when the lower bytes of an operand are masked. 
> > > In that case, the check in line 3048 will fail and no `G_MAD` will be created. `LocalAccum[0]` will still be set to the result of the Unmerge of the `Tmp` register in line 3060. `Tmp` is set to a zero register in line 3041, so it is all zeroes at this point.
> > > 
> > > By stepping through the debugger, I confirmed that in that case the first condition, `!LocalAccum[0]` will be false, but the second condition will be correctly evaluated to true and therefore skip the addition to 0.
> > If you're just looking for zero, just looking for the constant zero is cheaper than going through getKnownBits
> Sounds like a good idea, but how do I do that?
I guess he meant checking the operands for being zero explicitly. I think using `getKnownBits` is fine.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140208/new/

https://reviews.llvm.org/D140208



More information about the llvm-commits mailing list