[llvm] [AMDGPU][True16][Codegen] remove packed build_vector pattern from true16 (PR #148715)
Joe Nash via llvm-commits
llvm-commits at lists.llvm.org
Thu Jul 17 12:55:12 PDT 2025
================
@@ -77,11 +77,20 @@ define i32 @divergent_vec_0_i16(i16 %a) {
; GFX906-NEXT: v_lshlrev_b32_e32 v0, 16, v0
; GFX906-NEXT: s_setpc_b64 s[30:31]
;
----------------
Sisyph wrote:
To summarize my discussion on this with @broxigarchen,
- Fake16 exploits the fact that 16 bit values will always be in the lo16 bits during isel, which is why It can select build_vector with 0 into lshlrev for these 16 bit call arguments. In true16, we should not exploit that, and the better way to optimize is probably change to the calling convention to pack 16 bit values, instead of leaving them unpacked.
- As he said above, we can do some optimization in the coalescer to remove the extra v_mov_b32
https://github.com/llvm/llvm-project/pull/148715
More information about the llvm-commits
mailing list