[PATCH] D134433: [AMDGPU][GISel] Enable Matching of V2S16 G_BUILD_VECTOR

Fri Sep 23 05:48:42 PDT 2022

arsenm added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCombine.td:91-95
+def const_build_vector_fold : GICombineRule<
+  (defs root:$buildvector, uint64_matchinfo:$matchinfo),
+  (match (wip_match_opcode G_BUILD_VECTOR, G_BUILD_VECTOR_TRUNC):$buildvector,
+         [{ return RegBankHelper.matchConstBuildVectorFold(*${buildvector}, ${matchinfo}); }]),
+  (apply [{ RegBankHelper.applyConstBuildVectorFold(*${buildvector}, ${matchinfo}); }])>;
----------------
This combine can be a separate patch

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:729
-      .moreElementsIf(isSmallOddVector(0), oneMoreElement(0))
-      .clampMaxNumElements(0, S16, 2)
-      .clampScalar(0, S16, S64)
----------------
This shouldn't change

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:1448-1465
+        .bitcastIf(all(sizeIsMultipleOf32(VecTypeIdx),
+                       scalarOrEltWiderThan(VecTypeIdx, 64)),
+                   [=](const LegalityQuery &Query) {
+                     // For > 64-bit element types, try to turn this into a
+                     // 64-bit element vector since we may be able to do better
+                     // indexing if this is scalar. If not, fall back to 32.
+                     const LLT EltTy = Query.Types[EltTypeIdx];
----------------
This mostly looks like unwelcome clang-format changes. I think the only thing that really changed here was the S32->S16?

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp:341-342
+  Register Dst = MI.getOperand(0).getReg();
+  if (MRI.getType(Dst) != LLT::fixed_vector(2, 16))
+    return false;
+
----------------
In the future we may want to handle <2 x i32> and <4 x i16>, at least for gfx940

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp:368-369
+  Register Dst = MI.getOperand(0).getReg();
+  const RegisterBank *DstRB = MRI.getRegBankOrNull(Dst);
+  assert(DstRB);
+
----------------
Probably should have an asserting getRegBank to go along with getRegClass instead of littering asserts like this

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp:373
+  MRI.setRegBank(TmpReg, *DstRB);
+  B.buildConstant(TmpReg, Value);
+  B.buildBitcast(Dst, TmpReg);
----------------
You can do auto Const = B.buildConstant(S32,...); MRI.setRegBank(Const.getReg(0))

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134433/new/

https://reviews.llvm.org/D134433