[PATCH] D134433: [AMDGPU][GISel] Enable Matching of V2S16 G_BUILD_VECTOR
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Sep 28 07:05:44 PDT 2022
arsenm added inline comments.
================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.i16.ll:9
define amdgpu_ps i16 @extractelement_sgpr_v4i16_sgpr_idx(<4 x i16> addrspace(4)* inreg %ptr, i32 inreg %idx) {
-; GCN-LABEL: extractelement_sgpr_v4i16_sgpr_idx:
-; GCN: ; %bb.0:
-; GCN-NEXT: s_load_dwordx2 s[0:1], s[2:3], 0x0
-; GCN-NEXT: s_lshr_b32 s2, s4, 1
-; GCN-NEXT: s_cmp_eq_u32 s2, 1
-; GCN-NEXT: s_waitcnt lgkmcnt(0)
-; GCN-NEXT: s_cselect_b32 s0, s1, s0
-; GCN-NEXT: s_and_b32 s1, s4, 1
-; GCN-NEXT: s_lshl_b32 s1, s1, 4
-; GCN-NEXT: s_lshr_b32 s0, s0, s1
-; GCN-NEXT: ; return to shader part epilog
+; GFX9-LABEL: extractelement_sgpr_v4i16_sgpr_idx:
+; GFX9: ; %bb.0:
----------------
Pierre-vh wrote:
> foad wrote:
> > Pierre-vh wrote:
> > > foad wrote:
> > > > Using buffer_store/load instead of shifts for non-constant indices is a big regression.
> > > This seems to be coming from the change in the legalizer of G_EXTRACT_VECTOR_ELT at line 1445. Reverting it fixes it.
> > > It's because with that change, it gets lowered to the stack, so a G_STORE is emitted and it causes what we're seeing here.
> > >
> > > However if I revert the change, it becomes impossible to match mad_mix in `v_mad_mix_v2f32_clamp_postcvt_lo`, its G_INSERT_VECTOR_ELT/G_EXTRACT_VECTOR_ELT get lowered to the point that we can't even match the CLAMP.
> > >
> > > Perhaps I can add a condition to not bitcast if it's a 2x16 vector (`vectorWiderThan(VecTypeIdx, 32)`) ? It works, but seems like a bandaid fix though, no?
> > > @arsenm thoughts?
> > I would hope that it only gets lowered to memory accesses if the index is //non-constant//, and for matching mad_mix you are only interested in //constant// indices. Does that give you a way to fix both cases?
> `LegalityQuery` does not seem to contain any information about whether the indice is constant unfortunately, so it doesn't look like I can make that distinction in the legalizer unless I rewrite some of the legalization logic in a custom handler. I'd like to get @arsenm 's input first before implementing that though
>
> > I would hope that it only gets lowered to memory accesses if the index is non-constant
>
> That seems a bit orthogonal to the current issue though, the legalizer currently doesn't make a distinction based on the const-ness of the indice (I think) so maybe it's best to not start taking it into account here, as it may have other implications. I feel like the `vectorWiderThan(VecTypeIdx, 32)` solution would be fine for now even if it's a bit hacky
> I'll take another look at the situation tomorrow and see if I can come up with another fix
Legality cannot be context dependent. You can't have some cases be legal or not based on regular value operands.
I think treating the legalization different isn't the right strategy. In the DAG path we have combines to turn 16-bit vector operations into 32-bit vector operations with some shifts to extract of the elements. With optimizations, you wouldn't really ever use the 16->32 vector path
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D134433/new/
https://reviews.llvm.org/D134433
More information about the llvm-commits
mailing list