[PATCH] D155995: [AMDGPU]: Allow combining into v_dot4

Matt Arsenault via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Aug 18 07:26:18 PDT 2023


arsenm requested changes to this revision.
arsenm added inline comments.
This revision now requires changes to proceed.


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12291
+  auto Byte0 = calculateByteProvider(MulOperand, 0, 0);
+  if (!Byte0.has_value() || Byte0->isConstantZero()) {
+    return std::nullopt;
----------------
don't need .has_value() part


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12295
+  auto Byte1 = calculateByteProvider(MulOperand, 1, 0);
+  if (Byte1.has_value() && !Byte1->isConstantZero()) {
+    return std::nullopt;
----------------
Ditto


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12364-12365
+      };
+      auto Match = std::find_if(Srcs.begin(), Srcs.end(), MatchesSecond);
+      if (Match != Srcs.end()) {
+        Match->second = addPermMasks(SecondMask, Match->second);
----------------
is_contained


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12449
+
+  assert(Perms.size() > 0 && Perms.size() <= 2);
+  return Perms.size() == 2
----------------
== 1 || == 2?


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12484
+  // v_dot4 combining
+  auto ST = getSubtarget();
+
----------------
Don't know why we have getSubtarget, you can just use Subtarget-> directly


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12491
+      ST->hasDot7Insts() && (ST->hasDot1Insts() || ST->hasDot8Insts())) {
+    auto TempNode = SDValue(N, 0);
+    auto MulIdx =
----------------
SDValue TempNode(N, 0)


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12493-12496
+        (LHS.getOpcode() == AMDGPUISD::MUL_I24 ||
+         LHS.getOpcode() == AMDGPUISD::MUL_U24 || LHS.getOpcode() == ISD::MUL)
+            ? 0
+            : 1;
----------------
Don't understand what you are doing with this opcode to index check


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12521
+      auto Src0 = handleMulOperand(TempNode->getOperand(MulIdx)->getOperand(0));
+      if (!Src0.has_value())
+        break;
----------------
Don't need has_value()


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12538
+            handleMulOperand(TempNode->getOperand(AddIdx)->getOperand(1));
+        if (!Src1.has_value())
+          break;
----------------
Don't need has_value


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12582-12583
+        auto NextByte = Src0Mask & (0xFF << ((3 - I) * 8));
+        if (std::find(SrcBytes.begin(), SrcBytes.end(), NextByte) !=
+            SrcBytes.end()) {
+          UniqueEntries = false;
----------------
is_contained


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12608
+
+    std::optional<unsigned> Opcode;
+    if (ST->getGeneration() == AMDGPUSubtarget::GFX11) {
----------------
use deleted_node instead of optional opcode 


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12609
+    std::optional<unsigned> Opcode;
+    if (ST->getGeneration() == AMDGPUSubtarget::GFX11) {
+      Opcode =
----------------
Can you avoid generation checks?


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12615
+      Opcode =
+          IsSigned ? AMDGPU::V_DOT4_I32_I8_gfx10 : AMDGPU::V_DOT4_U32_U8_gfx10;
+    }
----------------
Can you go through the intrinsics instead of going straight to the machine node?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155995/new/

https://reviews.llvm.org/D155995



More information about the llvm-commits mailing list