[llvm] [AMDGPU] fix SIPeepholeSDWA optimization for fp16 (PR #109395)

Fri Sep 20 06:48:05 PDT 2024

arsenm wrote:

> This is what I'm seeing in the assembly "v_add_f16_sdwa v16, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1" which corrosponds to v_add_f16_sdwa

This looks different. The result operands are both using the same source register in the MIR, and here they are different. The original MIR looks like it's trying to do an add of the low and high 16-bit halves of the same register.

I'm not sure what the rules are when using the selectors on the same register in multiple operands. Maybe src0_sel should be using WORD_0? I'm not sure how DWORD is interpreted on a 16-bit source 

https://github.com/llvm/llvm-project/pull/109395