[PATCH] D140208: [AMDGPU] Improved wide multiplies

Jessica Del via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jan 10 04:32:40 PST 2023


OutOfCache marked 5 inline comments as done.
OutOfCache added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:2892-2897
+void AMDGPULegalizerInfo::buildMultiply(LegalizerHelper &Helper,
+                                        MutableArrayRef<Register> Accum,
+                                        ArrayRef<Register> Src0,
+                                        ArrayRef<Register> Src1,
+                                        bool UsePartialMad64_32,
+                                        bool SeparateOddAlignedProducts) const {
----------------
nhaehnle wrote:
> Did you run clang-format on this? Usually it chooses a different layout.
I did. It even caused failing pre-merge checks if I remember correctly. I ran clang-format again and this is still the result.


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:2993
             unsigned j1 = DstIndex - j0;
+            bool AtLeastOneArgIsZero =
+                Src0KB[j0].isZero() || Src1KB[j1].isZero();
----------------
nhaehnle wrote:
> tsymalla wrote:
> > It looks like this is doing the same thing like in LOC 3040. Does it make sense to cache the results (is there anything computed when invoking `isZero()`)? You could also cache the pairs that are known to be zero, but this is just optional. I'd just think about re-implementing the same thing a few lines later.
> nit: I don't think having the separate variable is worth it anymore, just put the condition directly into the if-statement (same below)
`isZero()` is calculating the number of zeroes and checking if the sum is the same as the total number of bits. Because of that, I decided to cache the results of `isZero()` in the vectors.


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:3180
+  const bool AtLeastOneSrcIsZero =
+      Helper.getKnownBits()->getKnownBits(Src0).isZero() ||
+      Helper.getKnownBits()->getKnownBits(Src1).isZero();
----------------
nhaehnle wrote:
> tsymalla wrote:
> > I'd still prefer to have the getter method `Helper::getKnownBits()` differently named. The one thing is returning a pointer to the known bits analysis, the other one is returning the known bits for a register.
> I think it's fine. The class is called `GISelKnownBits`, not `GISelKnownBitsAnalysis` or similar.
I kept it as is because of the type name and because it has the same name in the `CombinerHelper`


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:3182
 
+  const bool Src0IsZero = Helper.getKnownBits()->getKnownBits(Src0).isZero();
+  const bool Src1IsZero = Helper.getKnownBits()->getKnownBits(Src1).isZero();
----------------
tsymalla wrote:
> Naming is a bit unfortunate, as the function returning the pointer to the KBAnalysis is named the same as the function getting the known bits.
I can see the issue. I am not sure what to rename it to, though. `getKnownBitsAnalysis`? Not quite correct since the type is `GISelKnownBits`, but perhaps less confusing?


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:3007
             auto Mul = B.buildMul(S32, Src0[j0], Src1[j1]);
-            if (!LocalAccum[0]) {
+            if (!LocalAccum[0] || KB.getKnownBits(LocalAccum[0]).isZero()) {
               LocalAccum[0] = Mul.getReg(0);
----------------
This check is required, when the accumulator is a zero register.

`!LocalAccum[0]` only checks for the existence of a Register. It is still true, if the Register is known to be all zeroes.
This particular case occurs when the lower bytes of an operand are masked. 
In that case, the check in line 3048 will fail and no `G_MAD` will be created. `LocalAccum[0]` will still be set to the result of the Unmerge of the `Tmp` register in line 3060. `Tmp` is set to a zero register in line 3041, so it is all zeroes at this point.

By stepping through the debugger, I confirmed that in that case the first condition, `!LocalAccum[0]` will be false, but the second condition will be correctly evaluated to true and therefore skip the addition to 0.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140208/new/

https://reviews.llvm.org/D140208



More information about the llvm-commits mailing list