[llvm] [AMDGPU] fix SIPeepholeSDWA optimization for fp16 (PR #109395)

Fri Sep 20 06:53:31 PDT 2024

PankajDwivedi-25 wrote:

> > This is what I'm seeing in the assembly "v_add_f16_sdwa v16, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1" which corrosponds to v_add_f16_sdwa
> 
> This looks different. The result operands are both using the same source register in the MIR, and here they are different. The original MIR looks like it's trying to do an add of the low and high 16-bit halves of the same register.
> 
> I'm not sure what the rules are when using the selectors on the same register in multiple operands. Maybe src0_sel should be using WORD_0? I'm not sure how DWORD is interpreted on a 16-bit source

yes, the final assembly is having different set of registers. but the subword operands are the same.

>From my finding, both are doing the same job as you mentioned. not sure what goes wrong here my be should i use WORD-1 as a first operand? 

https://github.com/llvm/llvm-project/pull/109395