[llvm] [AMDGPU] fix SIPeepholeSDWA optimization for fp16 (PR #109395)

Fri Sep 20 08:01:48 PDT 2024

bfavela wrote:

Two thoughts:

1) The original title and commit title are about precision - AFAIK, there should be no assumptions of precision with f16. There aren't enough bits to ever guarantee that optimizations won't drift ULP errors. Like Matt has said, though, there is zero information about what the actual error is here.

2) One observation I can make is that the final assembly has "dst_unused:unused_pad", which means that the upper 16 bits of the result for both ALUs are going to get cleared to 0. The assumption is that this is the expected outcome of any f16 operation, but maybe some other pass is not expecting that?

https://github.com/llvm/llvm-project/pull/109395