[llvm] [AMDGPU] fix SIPeepholeSDWA optimization for fp16 (PR #109395)
Matt Arsenault via llvm-commits
llvm-commits at lists.llvm.org
Fri Sep 20 06:48:05 PDT 2024
arsenm wrote:
> This is what I'm seeing in the assembly "v_add_f16_sdwa v16, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1" which corrosponds to v_add_f16_sdwa
This looks different. The result operands are both using the same source register in the MIR, and here they are different. The original MIR looks like it's trying to do an add of the low and high 16-bit halves of the same register.
I'm not sure what the rules are when using the selectors on the same register in multiple operands. Maybe src0_sel should be using WORD_0? I'm not sure how DWORD is interpreted on a 16-bit source
https://github.com/llvm/llvm-project/pull/109395
More information about the llvm-commits
mailing list