[PATCH] D134463: [AMDGPU] Use V_PERM to match buildvectors when inputs are not canonicalized (i.e. can't use V_PACK)
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Sep 29 11:39:04 PDT 2022
arsenm added inline comments.
================
Comment at: llvm/test/CodeGen/AMDGPU/fast-unaligned-load-store.private.ll:240
; GFX9-NEXT: s_waitcnt vmcnt(0)
-; GFX9-NEXT: v_bfi_b32 v1, v1, 0, v0
+; GFX9-NEXT: v_bfi_b32 v1, s4, 0, v0
; GFX9-NEXT: v_and_or_b32 v0, v0, s4, v1
----------------
jrbyrnes wrote:
> foad wrote:
> > rampitec wrote:
> > > jrbyrnes wrote:
> > > > This seems illegal to me -- using SGPR and literal as operands to VALU. Looking into it.
> > > 0 is inline literal and is free.
> > As a code quality thing, this could have been optimized to `v_and_b32 v1, 0xffff0000, v0`
> Stas -- I see, thanks!
>
> Jay -- Interesting, I'll look into what's going on with the literal. As a side note, CodeGen is actually not good for this particular test. It seems to me the whole test can be combined into a 32 bit load. D133584 should be extended to handle this i16s, in which case this whole test will be optimized to a load.
This could only be a 16-bit load if unaligned access is enabled (and I think we previously decided that doing unaligned 16-bit loads was probably worse than byte loads). The load question is orthogonal to how the bit masking should have been emitted
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D134463/new/
https://reviews.llvm.org/D134463
More information about the llvm-commits
mailing list