[PATCH] D134433: [AMDGPU][GISel] Enable Matching of V2S16 G_BUILD_VECTOR

Wed Sep 28 07:40:17 PDT 2022

arsenm added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.i16.ll:9
 define amdgpu_ps i16 @extractelement_sgpr_v4i16_sgpr_idx(<4 x i16> addrspace(4)* inreg %ptr, i32 inreg %idx) {
-; GCN-LABEL: extractelement_sgpr_v4i16_sgpr_idx:
-; GCN:       ; %bb.0:
-; GCN-NEXT:    s_load_dwordx2 s[0:1], s[2:3], 0x0
-; GCN-NEXT:    s_lshr_b32 s2, s4, 1
-; GCN-NEXT:    s_cmp_eq_u32 s2, 1
-; GCN-NEXT:    s_waitcnt lgkmcnt(0)
-; GCN-NEXT:    s_cselect_b32 s0, s1, s0
-; GCN-NEXT:    s_and_b32 s1, s4, 1
-; GCN-NEXT:    s_lshl_b32 s1, s1, 4
-; GCN-NEXT:    s_lshr_b32 s0, s0, s1
-; GCN-NEXT:    ; return to shader part epilog
+; GFX9-LABEL: extractelement_sgpr_v4i16_sgpr_idx:
+; GFX9:       ; %bb.0:
----------------
Pierre-vh wrote:
> arsenm wrote:
> > Pierre-vh wrote:
> > > foad wrote:
> > > > Pierre-vh wrote:
> > > > > foad wrote:
> > > > > > Using buffer_store/load instead of shifts for non-constant indices is a big regression.
> > > > > This seems to be coming from the change in the legalizer of G_EXTRACT_VECTOR_ELT at line 1445. Reverting it fixes it.
> > > > > It's because with that change, it gets lowered to the stack, so a G_STORE is emitted and it causes what we're seeing here.
> > > > > 
> > > > > However if I revert the change, it becomes impossible to match mad_mix in `v_mad_mix_v2f32_clamp_postcvt_lo`, its G_INSERT_VECTOR_ELT/G_EXTRACT_VECTOR_ELT get lowered to the point that we can't even match the CLAMP.
> > > > > 
> > > > > Perhaps I can add a condition to not bitcast if it's a 2x16 vector (`vectorWiderThan(VecTypeIdx, 32)`) ? It works, but seems like a bandaid fix though, no?
> > > > > @arsenm thoughts?
> > > > I would hope that it only gets lowered to memory accesses if the index is //non-constant//, and for matching mad_mix you are only interested in //constant// indices. Does that give you a way to fix both cases?
> > > `LegalityQuery` does not seem to contain any information about whether the indice is constant unfortunately, so it doesn't look like I can make that distinction in the legalizer unless I rewrite some of the legalization logic in a custom handler. I'd like to get @arsenm 's input first before implementing that though
> > > 
> > > > I would hope that it only gets lowered to memory accesses if the index is non-constant
> > > 
> > > That seems a bit orthogonal to the current issue though, the legalizer currently doesn't make a distinction based on the const-ness of the indice (I think) so maybe it's best to not start taking it into account here, as it may have other implications. I feel like the `vectorWiderThan(VecTypeIdx, 32)` solution would be fine for now even if it's a bit hacky
> > > I'll take another look at the situation tomorrow and see if I can come up with another fix
> > Legality cannot be context dependent. You can't have some cases be legal or not based on regular value operands.
> > 
> > I think treating the legalization different isn't the right strategy. In the DAG path we have combines to turn 16-bit vector operations into 32-bit vector operations with some shifts to extract of the elements. With optimizations, you wouldn't really ever use the 16->32 vector path
> So the latest changes I made to the legalizer are good, and the remaining changes need to be in a pre-legalizer combine for V2S16 INSERT/EXTRACT_VECTOR_ELT ?
> 
So, the combine I was thinking of seems to be already done as the bitcast lowering for packed vectors (as is done in the old code). You should continue to use the bitcast lowering for vectors. You only want G_BUILD_VECTOR to be legal specifically for <2 x s16>

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134433/new/

https://reviews.llvm.org/D134433