[PATCH] D134463: [AMDGPU] Use V_PERM to match buildvectors when inputs are not canonicalized (i.e. can't use V_PACK)

Matt Arsenault via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Sep 29 11:39:04 PDT 2022

arsenm added inline comments.

Comment at: llvm/test/CodeGen/AMDGPU/fast-unaligned-load-store.private.ll:240
 ; GFX9-NEXT:    s_waitcnt vmcnt(0)
-; GFX9-NEXT:    v_bfi_b32 v1, v1, 0, v0
+; GFX9-NEXT:    v_bfi_b32 v1, s4, 0, v0
 ; GFX9-NEXT:    v_and_or_b32 v0, v0, s4, v1
jrbyrnes wrote:
> foad wrote:
> > rampitec wrote:
> > > jrbyrnes wrote:
> > > > This seems illegal to me -- using SGPR and literal as operands to VALU. Looking into it. 
> > > 0 is inline literal and is free.
> > As a code quality thing, this could have been optimized to `v_and_b32 v1, 0xffff0000, v0`
> Stas -- I see, thanks!
> Jay -- Interesting, I'll look into what's going on with the literal. As a side note, CodeGen is actually not good for this particular test. It seems to me the whole test can be combined into a 32 bit load. D133584 should be extended to handle this i16s, in which case this whole test will be optimized to a load.
This could only be a 16-bit load if unaligned access is enabled (and I think we previously decided that doing unaligned 16-bit loads was probably worse than byte loads). The load question is orthogonal to how the bit masking should have been emitted

  rG LLVM Github Monorepo



More information about the llvm-commits mailing list