[PATCH] D29338: AMDGPU: Basic folds for fmed3 intrinsic

Mon Feb 27 11:55:21 PST 2017

arsenm added a comment.

In https://reviews.llvm.org/D29338#687325, @artem.tamazov wrote:

> Looks good, but IEEE-754 correctness needs to be verified. **Is IEEE compliance required for llvm.amdgcn.fmed3.f32? **If it is, we shall look to formal definition of fmed3 and check carefully.
>
> For example, transformations like fmed3(0.0, 1.0, x) -> fmed3(x, 0.0, 1.0) may be non-IEEE-compliant w.r.t. sNANs when shader is in IEEE mode. That depends on expected semantics of fmed3, of course. For example, this is how V_MED3_F semantics is defined for Gfx8:
>
>   If (isNan(Src0) || isNan(Src1) || isNan(Src2))
>     Result = MIN3(Src0, Src1, Src2)
>   Else if (MAX3(Src0, Src1, Src2) == Src0)
>     Result = MAX(Src1, Src2)
>   Else if (MAX3(Src0, Src1, Src2) == Src1)
>     Result = MAX(Src0, Src2)
>   Else
>     Result = MAX(Src0, Src1)
>

It should match the instruction behavior, but we don't necessarily care about it treating signaling NaNs correctly though. LLVM in general isn't aware of them and breaks their behavior everywhere. The new constrained FP intrinsics should be aware of proper snan behavior though. When we have a complete set of constrained FP intrinsics and when people start using them, we could add a constrained version which would need to properly handle sNaNs. As far as this intrinsic is concerned, as long as it preserves general NaN behavior ignoring quieting etc. that should OK

https://reviews.llvm.org/D29338