[PATCH] D93708: [AMDGPU] Add a new Clamp Pattern to the GlobalISel Path.

Mon Jan 11 14:22:41 PST 2021

arsenm added a comment.

I would expect this in the post legalizer combiner, and going directly from i64 is a bit weird. What does the post-legalized MIR look like for this?

================
Comment at: llvm/include/llvm/CodeGen/GlobalISel/MIPatternMatch.h:455
 };
+
 template <typename Src0Ty, typename Src1Ty, typename Src2Ty>
----------------
Remove leftover whitespace change

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp:191-192
   if (Ty == S32) {
-    B.buildInstr(AMDGPU::G_AMDGPU_CVT_F32_UBYTE0, {DstReg},
-                   {SrcReg}, MI.getFlags());
+    B.buildInstr(AMDGPU::G_AMDGPU_CVT_F32_UBYTE0, {DstReg}, {SrcReg},
+                 MI.getFlags());
   } else {
----------------
Unrelated formatting changes

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp:248

+
 class AMDGPUPostLegalizerCombinerHelperState {
----------------
Extra whitespace change

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp:62
+
+  // we want to check if a 64-bit number gets clamped to 16-bit boundaries (or
+  // below).
----------------
Capitalize

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp:74
+
+  MachineIRBuilder B(MI);
+
----------------
Shouldn't construct single use MachineIRBuilder, there should be one in the combiner state

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp:104
+
+  // are we really trying to clamp against the relevant boundaries?
+  return ((Cmp2 >= Cmp1 && Cmp1 >= Min && Cmp2 <= Max) ||
----------------
Capitalize

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp:119
+    MachineInstr &MI, const ClampI64ToI16MatchInfo &MatchInfo) {
+  LLVM_DEBUG(dbgs() << "Combining MI\n");
+
----------------
This is just noise, remove it

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp:121
+
+  MachineIRBuilder B(MI);
+  MachineRegisterInfo &MRI = MI.getParent()->getParent()->getRegInfo();
----------------
Don't construct fresh builder

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp:129
+  auto Unmerge = B.buildUnmerge(S32, Src);
+  Register Hi32 = Unmerge->getOperand(0).getReg();
+  Register Lo32 = Unmerge->getOperand(1).getReg();
----------------
You can just do Unmerge.getReg(0)

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp:134
+
+  constexpr unsigned int CvtOpcode = AMDGPU::V_CVT_PK_I16_I32_e64;
+  assert(MI.getOpcode() != CvtOpcode);
----------------
It would be better to use a generic operation pre-regbank select rather than forcing register classes here

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp:148-149
+
+  auto min = std::min(MatchInfo.Cmp1, MatchInfo.Cmp2);
+  auto max = std::max(MatchInfo.Cmp1, MatchInfo.Cmp2);
+
----------------
Bad shadowing name

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp:151-153
+  Register MinBoundaryDst = MRI.createVirtualRegister(REG_CLASS);
+  MRI.setType(MinBoundaryDst, S32);
+  B.buildConstant(MinBoundaryDst, min);
----------------
You can combine this as auto MinBoundaryDst = B.buildConstant(S32, min)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93708/new/

https://reviews.llvm.org/D93708