[PATCH] D140208: [AMDGPU] Improved wide multiplies
Nicolai Hähnle via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Jan 10 02:02:18 PST 2023
nhaehnle added a comment.
Thanks, this already looks good to me. I do have a small number of comments inline still.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:2892-2897
+void AMDGPULegalizerInfo::buildMultiply(LegalizerHelper &Helper,
+ MutableArrayRef<Register> Accum,
+ ArrayRef<Register> Src0,
+ ArrayRef<Register> Src1,
+ bool UsePartialMad64_32,
+ bool SeparateOddAlignedProducts) const {
----------------
Did you run clang-format on this? Usually it chooses a different layout.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:2924-2927
+ for (unsigned i = 0; i < Src0.size(); ++i) {
+ Src0KB.push_back(KB.getKnownBits(Src0[i]));
+ Src1KB.push_back(KB.getKnownBits(Src1[i]));
+ }
----------------
Please remove the assumption that Src0 and Src1 are equally long. I'd suggest using two for-each loops.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:2993-2995
+ bool AtLeastOneArgIsZero =
+ Src0KB[j0].isZero() || Src1KB[j1].isZero();
+ if (AtLeastOneArgIsZero) {
----------------
tsymalla wrote:
> It looks like this is doing the same thing like in LOC 3040. Does it make sense to cache the results (is there anything computed when invoking `isZero()`)? You could also cache the pairs that are known to be zero, but this is just optional. I'd just think about re-implementing the same thing a few lines later.
nit: I don't think having the separate variable is worth it anymore, just put the condition directly into the if-statement (same below)
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:3180
+ const bool AtLeastOneSrcIsZero =
+ Helper.getKnownBits()->getKnownBits(Src0).isZero() ||
+ Helper.getKnownBits()->getKnownBits(Src1).isZero();
----------------
tsymalla wrote:
> I'd still prefer to have the getter method `Helper::getKnownBits()` differently named. The one thing is returning a pointer to the known bits analysis, the other one is returning the known bits for a register.
I think it's fine. The class is called `GISelKnownBits`, not `GISelKnownBitsAnalysis` or similar.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:3183-3187
+ if (AtLeastOneSrcIsZero) {
+ B.buildConstant(DstReg, 0);
+ MI.eraseFromParent();
+ return true;
+ }
----------------
Does this actually happen? (Is there a test case that changes when you remove this?) I would have thought that such a G_MUL would have been eliminated by an earlier combine.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D140208/new/
https://reviews.llvm.org/D140208
More information about the llvm-commits
mailing list