[llvm] [AMDGPU] fix SIPeepholeSDWA optimization for fp16 (PR #109395)

Brian Favela via llvm-commits llvm-commits at lists.llvm.org
Mon Dec 16 08:02:45 PST 2024


bfavela wrote:

_e32 vs _e64 is just different encodings (1 dword vs 2 dwords of ISA). The first case can use the single dword encoding (VOPC) as "VCC" is the implied destination in that encoding. The second case writes to 2 SGPRs which means it needs to be promoted to a 2-dword VOP3 encoding.
SDWA version is still equivalent in what you've shown here for the reasons Arsen outlined earlier.

https://github.com/llvm/llvm-project/pull/109395


More information about the llvm-commits mailing list