[PATCH] D155995: [AMDGPU]: Allow combining into v_dot4
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Aug 18 07:26:18 PDT 2023
arsenm requested changes to this revision.
arsenm added inline comments.
This revision now requires changes to proceed.
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12291
+ auto Byte0 = calculateByteProvider(MulOperand, 0, 0);
+ if (!Byte0.has_value() || Byte0->isConstantZero()) {
+ return std::nullopt;
----------------
don't need .has_value() part
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12295
+ auto Byte1 = calculateByteProvider(MulOperand, 1, 0);
+ if (Byte1.has_value() && !Byte1->isConstantZero()) {
+ return std::nullopt;
----------------
Ditto
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12364-12365
+ };
+ auto Match = std::find_if(Srcs.begin(), Srcs.end(), MatchesSecond);
+ if (Match != Srcs.end()) {
+ Match->second = addPermMasks(SecondMask, Match->second);
----------------
is_contained
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12449
+
+ assert(Perms.size() > 0 && Perms.size() <= 2);
+ return Perms.size() == 2
----------------
== 1 || == 2?
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12484
+ // v_dot4 combining
+ auto ST = getSubtarget();
+
----------------
Don't know why we have getSubtarget, you can just use Subtarget-> directly
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12491
+ ST->hasDot7Insts() && (ST->hasDot1Insts() || ST->hasDot8Insts())) {
+ auto TempNode = SDValue(N, 0);
+ auto MulIdx =
----------------
SDValue TempNode(N, 0)
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12493-12496
+ (LHS.getOpcode() == AMDGPUISD::MUL_I24 ||
+ LHS.getOpcode() == AMDGPUISD::MUL_U24 || LHS.getOpcode() == ISD::MUL)
+ ? 0
+ : 1;
----------------
Don't understand what you are doing with this opcode to index check
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12521
+ auto Src0 = handleMulOperand(TempNode->getOperand(MulIdx)->getOperand(0));
+ if (!Src0.has_value())
+ break;
----------------
Don't need has_value()
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12538
+ handleMulOperand(TempNode->getOperand(AddIdx)->getOperand(1));
+ if (!Src1.has_value())
+ break;
----------------
Don't need has_value
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12582-12583
+ auto NextByte = Src0Mask & (0xFF << ((3 - I) * 8));
+ if (std::find(SrcBytes.begin(), SrcBytes.end(), NextByte) !=
+ SrcBytes.end()) {
+ UniqueEntries = false;
----------------
is_contained
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12608
+
+ std::optional<unsigned> Opcode;
+ if (ST->getGeneration() == AMDGPUSubtarget::GFX11) {
----------------
use deleted_node instead of optional opcode
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12609
+ std::optional<unsigned> Opcode;
+ if (ST->getGeneration() == AMDGPUSubtarget::GFX11) {
+ Opcode =
----------------
Can you avoid generation checks?
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12615
+ Opcode =
+ IsSigned ? AMDGPU::V_DOT4_I32_I8_gfx10 : AMDGPU::V_DOT4_U32_U8_gfx10;
+ }
----------------
Can you go through the intrinsics instead of going straight to the machine node?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D155995/new/
https://reviews.llvm.org/D155995
More information about the llvm-commits
mailing list