[PATCH] D134433: [AMDGPU][GISel] Enable Matching of V2S16 G_BUILD_VECTOR

Wed Sep 28 07:05:44 PDT 2022

arsenm added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.i16.ll:9
 define amdgpu_ps i16 @extractelement_sgpr_v4i16_sgpr_idx(<4 x i16> addrspace(4)* inreg %ptr, i32 inreg %idx) {
-; GCN-LABEL: extractelement_sgpr_v4i16_sgpr_idx:
-; GCN:       ; %bb.0:
-; GCN-NEXT:    s_load_dwordx2 s[0:1], s[2:3], 0x0
-; GCN-NEXT:    s_lshr_b32 s2, s4, 1
-; GCN-NEXT:    s_cmp_eq_u32 s2, 1
-; GCN-NEXT:    s_waitcnt lgkmcnt(0)
-; GCN-NEXT:    s_cselect_b32 s0, s1, s0
-; GCN-NEXT:    s_and_b32 s1, s4, 1
-; GCN-NEXT:    s_lshl_b32 s1, s1, 4
-; GCN-NEXT:    s_lshr_b32 s0, s0, s1
-; GCN-NEXT:    ; return to shader part epilog
+; GFX9-LABEL: extractelement_sgpr_v4i16_sgpr_idx:
+; GFX9:       ; %bb.0:
----------------
Pierre-vh wrote:
> foad wrote:
> > Pierre-vh wrote:
> > > foad wrote:
> > > > Using buffer_store/load instead of shifts for non-constant indices is a big regression.
> > > This seems to be coming from the change in the legalizer of G_EXTRACT_VECTOR_ELT at line 1445. Reverting it fixes it.
> > > It's because with that change, it gets lowered to the stack, so a G_STORE is emitted and it causes what we're seeing here.
> > > 
> > > However if I revert the change, it becomes impossible to match mad_mix in `v_mad_mix_v2f32_clamp_postcvt_lo`, its G_INSERT_VECTOR_ELT/G_EXTRACT_VECTOR_ELT get lowered to the point that we can't even match the CLAMP.
> > > 
> > > Perhaps I can add a condition to not bitcast if it's a 2x16 vector (`vectorWiderThan(VecTypeIdx, 32)`) ? It works, but seems like a bandaid fix though, no?
> > > @arsenm thoughts?
> > I would hope that it only gets lowered to memory accesses if the index is //non-constant//, and for matching mad_mix you are only interested in //constant// indices. Does that give you a way to fix both cases?
> `LegalityQuery` does not seem to contain any information about whether the indice is constant unfortunately, so it doesn't look like I can make that distinction in the legalizer unless I rewrite some of the legalization logic in a custom handler. I'd like to get @arsenm 's input first before implementing that though
> 
> > I would hope that it only gets lowered to memory accesses if the index is non-constant
> 
> That seems a bit orthogonal to the current issue though, the legalizer currently doesn't make a distinction based on the const-ness of the indice (I think) so maybe it's best to not start taking it into account here, as it may have other implications. I feel like the `vectorWiderThan(VecTypeIdx, 32)` solution would be fine for now even if it's a bit hacky
> I'll take another look at the situation tomorrow and see if I can come up with another fix
Legality cannot be context dependent. You can't have some cases be legal or not based on regular value operands.

I think treating the legalization different isn't the right strategy. In the DAG path we have combines to turn 16-bit vector operations into 32-bit vector operations with some shifts to extract of the elements. With optimizations, you wouldn't really ever use the 16->32 vector path

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134433/new/

https://reviews.llvm.org/D134433