[PATCH] D154517: AMDGPU: Always use v_rcp_f16 and v_rsq_f16

Fri Jul 14 04:08:46 PDT 2023

arsenm added a comment.

In D154517#4478712 <https://reviews.llvm.org/D154517#4478712>, @arsenm wrote:

> In D154517#4476671 <https://reviews.llvm.org/D154517#4476671>, @foad wrote:
>
>>> Brute force produces identical values compared to a reference host implementation for all values.
>>
>> Have you tested v_sqrt_f16 or any other f16 trans instructions?
>
> Haven't gotten there yet

v_sqrt_f16 is identical.
v_log_f16 is is identical.
v_exp_f16 has a single value differ: ref=0x1p+0 inst=0x1.004p+0

I'm also comparing these by cast to float host implementations, maybe a proper f16 implementation would have rounded these differences differently?

I think I'm doing something wrong with the pre-scaling for sin/cos, those results just seem totally wrong

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154517/new/

https://reviews.llvm.org/D154517