[PATCH] D140208: [AMDGPU] Improved wide multiplies

Tue Jan 10 02:02:18 PST 2023

nhaehnle added a comment.

Thanks, this already looks good to me. I do have a small number of comments inline still.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:2892-2897
+void AMDGPULegalizerInfo::buildMultiply(LegalizerHelper &Helper,
+                                        MutableArrayRef<Register> Accum,
+                                        ArrayRef<Register> Src0,
+                                        ArrayRef<Register> Src1,
+                                        bool UsePartialMad64_32,
+                                        bool SeparateOddAlignedProducts) const {
----------------
Did you run clang-format on this? Usually it chooses a different layout.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:2924-2927
+  for (unsigned i = 0; i < Src0.size(); ++i) {
+    Src0KB.push_back(KB.getKnownBits(Src0[i]));
+    Src1KB.push_back(KB.getKnownBits(Src1[i]));
+  }
----------------
Please remove the assumption that Src0 and Src1 are equally long. I'd suggest using two for-each loops.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:2993-2995
+            bool AtLeastOneArgIsZero =
+                Src0KB[j0].isZero() || Src1KB[j1].isZero();
+            if (AtLeastOneArgIsZero) {
----------------
tsymalla wrote:
> It looks like this is doing the same thing like in LOC 3040. Does it make sense to cache the results (is there anything computed when invoking `isZero()`)? You could also cache the pairs that are known to be zero, but this is just optional. I'd just think about re-implementing the same thing a few lines later.
nit: I don't think having the separate variable is worth it anymore, just put the condition directly into the if-statement (same below)

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:3180
+  const bool AtLeastOneSrcIsZero =
+      Helper.getKnownBits()->getKnownBits(Src0).isZero() ||
+      Helper.getKnownBits()->getKnownBits(Src1).isZero();
----------------
tsymalla wrote:
> I'd still prefer to have the getter method `Helper::getKnownBits()` differently named. The one thing is returning a pointer to the known bits analysis, the other one is returning the known bits for a register.
I think it's fine. The class is called `GISelKnownBits`, not `GISelKnownBitsAnalysis` or similar.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:3183-3187
+  if (AtLeastOneSrcIsZero) {
+    B.buildConstant(DstReg, 0);
+    MI.eraseFromParent();
+    return true;
+  }
----------------
Does this actually happen? (Is there a test case that changes when you remove this?) I would have thought that such a G_MUL would have been eliminated by an earlier combine.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140208/new/

https://reviews.llvm.org/D140208