[PATCH] D134870: [AMDGPU][GISel] Better support for V2S16 G_EXTRACT/INSERT_VECTOR_ELT

Thu Sep 29 07:38:49 PDT 2022

Pierre-vh added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp:160

+bool AMDGPUPreLegalizerCombinerHelper::matchV2S16ExtractInsertVectorEltFold(
+    MachineInstr &MI, unsigned &Idx) {
----------------
arsenm wrote:
> This combine is too targeted (and I'm pretty sure it's redundant). The generic combiner already turns G_INSERT_VECTOR_ELT with constant chains into G_BUILD_VECTOR. 
This combine is the most important of the two though
I see that there's indeed a matchCombineInsertVecElts/matchExtractVecEltBuildVec but it didn't work at all in the mad_mix cases, I think it's because the source isn't a BUILD_VECTOR in those cases but instead it's operations such as `G_FPTRUNC <2 x 16>` so the combine isn't redundant as far as I can see, it's just more focused on V2S16s

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp:209
+    assert(MI.getOpcode() == AMDGPU::G_EXTRACT_VECTOR_ELT);
+    Register Elt = B.buildUnmerge(EltTy, VecSrc).getReg(Idx);
+
----------------
arsenm wrote:
> We don't really want to have 16-bit unmerges. I'm not sure how this combine helps you with the current problem
It seems that in all of those cases, the unmerge ends up removed because the source value gets scalarized.
e.g. we have `G_UNMERGE_VALUE (v2s16 (G_FPTRUNC ...))`  but the trunc gets split into two scalar operations, removing the unmerge.
Why can't we have 16 bit unmerges? Won't the legalizer deal with them if they shouldn't be there?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134870/new/

https://reviews.llvm.org/D134870