[llvm] [AMDGPU] fix SIPeepholeSDWA optimization for fp16 (PR #109395)
Brian Favela via llvm-commits
llvm-commits at lists.llvm.org
Mon Dec 16 08:02:45 PST 2024
bfavela wrote:
_e32 vs _e64 is just different encodings (1 dword vs 2 dwords of ISA). The first case can use the single dword encoding (VOPC) as "VCC" is the implied destination in that encoding. The second case writes to 2 SGPRs which means it needs to be promoted to a 2-dword VOP3 encoding.
SDWA version is still equivalent in what you've shown here for the reasons Arsen outlined earlier.
https://github.com/llvm/llvm-project/pull/109395
More information about the llvm-commits
mailing list