[PATCH] D22898: AMDGPU: Fix ffloor for SI

Marek Olšák via llvm-commits llvm-commits at lists.llvm.org
Wed Aug 17 05:03:09 PDT 2016


mareko added a comment.

I don't understand.

min(x, 1) is an no-op operation in this case. It doesn't avoid the hardware bug. You could remove that MIN instruction and the behavior would be exactly the same.

The bug information (edited for publishing):

- SI: Precision issue for FRACT_F32/64 opcodes *

3.31.1.1	Synopsis
Range of outputs for FRACT opcode is [+0.0, 1.0).  The hardware is outputting 1.0 for very small negative inputs (i.e. 0xb3000000).

3.31.1.2	Symptoms
Precision difference with OpenCL conformance test, SW already using workaround.  (Could potentially cause precision difference with other APIs.)

3.31.1.3	Scope
Found in all SI family.

3.31.1.4	Suggested Driver Solution
Compiler Expansion for FRACT_F32:

  out = FRACT_F32(in)
  out = MIN_F32(out, 0x3f7fffff)
  out = ISNAN_F32(in) ? in : out;

(Note: 1.0 == 0x3f800000, thus 1.0 is not correct)

Here's what the closed compiler does for https://reviews.llvm.org/F64:

- If the Abs modifier is 1 and the Negate modifier is 0, don't apply the workaround.
- Otherwise, use V_MIN_F64(0x3fefffffffffffff, x). If IEEE should be obeyed (optional), preserve NaNs with V_CMP_CLASS_F64 and 2x V_CNDMASK_B32.


https://reviews.llvm.org/D22898





More information about the llvm-commits mailing list