[PATCH] D140208: [AMDGPU] Improved wide multiplies
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jan 11 16:38:08 PST 2023
arsenm added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:3007
auto Mul = B.buildMul(S32, Src0[j0], Src1[j1]);
- if (!LocalAccum[0]) {
+ if (!LocalAccum[0] || KB.getKnownBits(LocalAccum[0]).isZero()) {
LocalAccum[0] = Mul.getReg(0);
----------------
OutOfCache wrote:
> This check is required, when the accumulator is a zero register.
>
> `!LocalAccum[0]` only checks for the existence of a Register. It is still true, if the Register is known to be all zeroes.
> This particular case occurs when the lower bytes of an operand are masked.
> In that case, the check in line 3048 will fail and no `G_MAD` will be created. `LocalAccum[0]` will still be set to the result of the Unmerge of the `Tmp` register in line 3060. `Tmp` is set to a zero register in line 3041, so it is all zeroes at this point.
>
> By stepping through the debugger, I confirmed that in that case the first condition, `!LocalAccum[0]` will be false, but the second condition will be correctly evaluated to true and therefore skip the addition to 0.
If you're just looking for zero, just looking for the constant zero is cheaper than going through getKnownBits
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D140208/new/
https://reviews.llvm.org/D140208
More information about the llvm-commits
mailing list