[PATCH] D134870: [AMDGPU][GISel] Better support for V2S16 G_EXTRACT/INSERT_VECTOR_ELT

Thu Sep 29 07:29:32 PDT 2022

arsenm added a comment.

The trunc combine LGTM. The vector combine I'm not sure about

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp:309-320
+bool AMDGPUPostLegalizerCombinerHelper::matchTruncBuildVectorFold(
+    MachineInstr &MI, Register &MatchInfo) {
+  assert(MI.getOpcode() == AMDGPU::G_TRUNC);
+
+  // Replace (G_TRUNC (G_BITCAST (G_BUILD_VECTOR x, y)) with just x
+  // if type(x) == type(G_TRUNC)
+  if (!mi_match(MI.getOperand(1).getReg(), MRI,
----------------
This part LGTM

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp:322-326
+void AMDGPUPostLegalizerCombinerHelper::applyTruncBuildVectorFold(
+    MachineInstr &MI, Register &MatchInfo) {
+  MRI.replaceRegWith(MI.getOperand(0).getReg(), MatchInfo);
+  MI.eraseFromParent();
+}
----------------
Can use replaceSingleDefInstWithReg

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp:160

+bool AMDGPUPreLegalizerCombinerHelper::matchV2S16ExtractInsertVectorEltFold(
+    MachineInstr &MI, unsigned &Idx) {
----------------
This combine is too targeted (and I'm pretty sure it's redundant). The generic combiner already turns G_INSERT_VECTOR_ELT with constant chains into G_BUILD_VECTOR. 

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp:209
+    assert(MI.getOpcode() == AMDGPU::G_EXTRACT_VECTOR_ELT);
+    Register Elt = B.buildUnmerge(EltTy, VecSrc).getReg(Idx);
+
----------------
We don't really want to have 16-bit unmerges. I'm not sure how this combine helps you with the current problem

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134870/new/

https://reviews.llvm.org/D134870