[PATCH] D22898: AMDGPU: Fix ffloor for SI

Wed Aug 17 11:42:17 PDT 2016

arsenm added a comment.

In https://reviews.llvm.org/D22898#517905, @mareko wrote:

> I don't understand.
>
> min(x, 1) is an no-op operation in this case. It doesn't avoid the hardware bug. You could remove that MIN instruction and the behavior would be exactly the same.
>
> The bug information (edited for publishing):
>
> - SI: Precision issue for FRACT_F32/64 opcodes *
>
>   3.31.1.1	Synopsis Range of outputs for FRACT opcode is [+0.0, 1.0).  The hardware is outputting 1.0 for very small negative inputs (i.e. 0xb3000000).
>
>   3.31.1.2	Symptoms Precision difference with OpenCL conformance test, SW already using workaround.  (Could potentially cause precision difference with other APIs.)
>
>   3.31.1.3	Scope Found in all SI family.
>
>   3.31.1.4	Suggested Driver Solution Compiler Expansion for FRACT_F32: out = FRACT_F32(in) out = MIN_F32(out, 0x3f7fffff) out = ISNAN_F32(in) ? in : out;
>
>   (Note: 1.0 == 0x3f800000, thus 1.0 is not correct)
>
>   Here's what the closed compiler does for https://reviews.llvm.org/F64:
> - If the Abs modifier is 1 and the Negate modifier is 0, don't apply the workaround.
> - Otherwise, use V_MIN_F64(0x3fefffffffffffff, x). If IEEE should be obeyed (optional), preserve NaNs with V_CMP_CLASS_F64 and 2x V_CNDMASK_B32.

This is what I get when I dump sc's output for __amdil_fraction_f64:

  v_fract_f64   v[0:1], s[0:1]                              // 0000000C: 7E007C00
  v_mov_b32     v2, -1                                      // 00000010: 7E0402C1
  v_mov_b32     v3, 0x3fefffff                              // 00000014: 7E0602FF 3FEFFFFF
  v_min_f64     v[2:3], v[2:3], v[0:1]                      // 0000001C: D2CC0002 00020102
  v_cmp_class_f64  vcc, v[0:1], 3                           // 00000024: D150006A 00010700
  v_cndmask_b32  v0, v2, v0, vcc                            // 0000002C: 00000102
  v_cndmask_b32  v1, v3, v1, vcc      

v[2:3] = 0x3fefffffffffffff = 1.0

In https://reviews.llvm.org/D22898#517905, @mareko wrote:

> I don't understand.
>
> min(x, 1) is an no-op operation in this case. It doesn't avoid the hardware bug. You could remove that MIN instruction and the behavior would be exactly the same.
>
> The bug information (edited for publishing):
>
> - SI: Precision issue for FRACT_F32/64 opcodes *
>
>   3.31.1.1	Synopsis Range of outputs for FRACT opcode is [+0.0, 1.0).  The hardware is outputting 1.0 for very small negative inputs (i.e. 0xb3000000).
>
>   3.31.1.2	Symptoms Precision difference with OpenCL conformance test, SW already using workaround.  (Could potentially cause precision difference with other APIs.)
>
>   3.31.1.3	Scope Found in all SI family.
>
>   3.31.1.4	Suggested Driver Solution Compiler Expansion for FRACT_F32: out = FRACT_F32(in) out = MIN_F32(out, 0x3f7fffff) out = ISNAN_F32(in) ? in : out;
>
>   (Note: 1.0 == 0x3f800000, thus 1.0 is not correct)
>
>   Here's what the closed compiler does for https://reviews.llvm.org/F64:
> - If the Abs modifier is 1 and the Negate modifier is 0, don't apply the workaround.
> - Otherwise, use V_MIN_F64(0x3fefffffffffffff, x). If IEEE should be obeyed (optional), preserve NaNs with V_CMP_CLASS_F64 and 2x V_CNDMASK_B32.

I'm confused now, because using 1.0 does pass conformance and using 0x3fefffffffffffff does not. This specifically is the workaround for v_fract_f64, but this is the implementation for ffloor. Is this really supposed to be a pure x - fract(x)? It also appears that there is a different bug in v_fract_f64 on CI, but it seems we don't do anything about that right now.

https://reviews.llvm.org/D22898