[PATCH] D140208: [AMDGPU] Improved wide multiplies
Jessica Del via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Jan 23 07:15:24 PST 2023
OutOfCache added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:3007
auto Mul = B.buildMul(S32, Src0[j0], Src1[j1]);
- if (!LocalAccum[0]) {
+ if (!LocalAccum[0] || KB.getKnownBits(LocalAccum[0]).isZero()) {
LocalAccum[0] = Mul.getReg(0);
----------------
OutOfCache wrote:
> arsenm wrote:
> > tsymalla wrote:
> > > OutOfCache wrote:
> > > > arsenm wrote:
> > > > > OutOfCache wrote:
> > > > > > This check is required, when the accumulator is a zero register.
> > > > > >
> > > > > > `!LocalAccum[0]` only checks for the existence of a Register. It is still true, if the Register is known to be all zeroes.
> > > > > > This particular case occurs when the lower bytes of an operand are masked.
> > > > > > In that case, the check in line 3048 will fail and no `G_MAD` will be created. `LocalAccum[0]` will still be set to the result of the Unmerge of the `Tmp` register in line 3060. `Tmp` is set to a zero register in line 3041, so it is all zeroes at this point.
> > > > > >
> > > > > > By stepping through the debugger, I confirmed that in that case the first condition, `!LocalAccum[0]` will be false, but the second condition will be correctly evaluated to true and therefore skip the addition to 0.
> > > > > If you're just looking for zero, just looking for the constant zero is cheaper than going through getKnownBits
> > > > Sounds like a good idea, but how do I do that?
> > > I guess he meant checking the operands for being zero explicitly. I think using `getKnownBits` is fine.
> > Check if it's G_CONSTANT i32 0. There are a few too many ways to check for it (I'd suggest MIPatternMatch's m_ZeroInt)
> I tried `mi_match(LocalAccum[0], MRI, m_ZeroInt())`, but for some reason it always returned false.
>
> I also tried replacing the `SrcXKnownZeros.push_back(KB.getKnownBits(SrcX[i]).isZero())` with `Src0KnownZeros.push_back(mi_match(SrcX[i], MRI, m_ZeroInt())` and similarly, it returned false when the first one returned true.
>
> This also caused the `@v_mul_i64_masked_src0_lo` and `@v_mul_i64_masked_src1_lo` tests to fail and produce multiplications with 0.
This is the Code before the Legalizer:
```
bb.1.entry:
liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3
%2:_(s32) = COPY $vgpr0
%3:_(s32) = COPY $vgpr1
%0:_(s64) = G_MERGE_VALUES %2:_(s32), %3:_(s32)
%4:_(s32) = COPY $vgpr2
%5:_(s32) = COPY $vgpr3
%1:_(s64) = G_MERGE_VALUES %4:_(s32), %5:_(s32)
%6:_(s64) = G_CONSTANT i64 -4294967296
%7:_(s64) = G_AND %1:_, %6:_
%8:_(s64) = G_MUL %0:_, %7:_
%9:_(s32), %10:_(s32) = G_UNMERGE_VALUES %8:_(s64)
$vgpr0 = COPY %9:_(s32)
$vgpr1 = COPY %10:_(s32)
SI_RETURN implicit $vgpr0, implicit $vgpr1
```
The only G_CONSTANTs are the mask for the G_AND and a 64-bit 0 for the G_MAD addition
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D140208/new/
https://reviews.llvm.org/D140208
More information about the llvm-commits
mailing list