<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>
</head>
<body dir="ltr">
<div id="divtagdefaultwrapper" style="font-size:12pt;color:#000000;font-family:Calibri,Helvetica,sans-serif;" dir="ltr">
<p style="margin-top:0;margin-bottom:0"><span style="font-family: Helvetica; font-size: 12px;">Hi,</span></p>
<div style="font-family: Helvetica; font-size: 12px;"><br>
</div>
<div style="font-family: Helvetica; font-size: 12px;">The specification for the llvm.minnum/llvm.maxnum intrinsics is too unclear right now to usefully optimize. There are two problems. First the expected behavior for signaling NaNs needs to be clarified. Second,
whether the returned value is expected to be canonicalized (as if by llvm.canonicalize).</div>
<div style="font-family: Helvetica; font-size: 12px;"><br>
</div>
<div style="font-family: Helvetica; font-size: 12px;">Currently according to the LangRef:</div>
<div style="font-family: Helvetica; font-size: 12px;">
<div><br>
</div>
<div>Follows the IEEE-754 semantics for minNum, which also match for libm's</div>
<div>fmin.</div>
<div><br>
</div>
<div>If either operand is a NaN, returns the other non-NaN operand. Returns</div>
<div>NaN only if both operands are NaN. If the operands compare equal,</div>
<div>returns a value that compares equal to both operands. This means that</div>
<div>fmin(+/-0.0, +/-0.0) could return either -0.0 or 0.0.</div>
</div>
<div style="font-family: Helvetica; font-size: 12px;"><br>
</div>
<div style="font-family: Helvetica; font-size: 12px;">This first line is a lie. This isn’t true for the case of signaling NaNs. The IEEE rule is if either input is a signaling nan, it returns a quieted NaN, not the other operand. The C standard definition for
fmin/fmax do not make this distinction, and just return the other operand.</div>
<div style="font-family: Helvetica; font-size: 12px;"><br>
</div>
<div style="font-family: Helvetica; font-size: 12px;">The constant folding for these currently match the libm behavior, returning the non-NaN operand and will never quiet. The default lowering for these operations also just directly calls the system’s fmin/fmax.</div>
<div style="font-family: Helvetica; font-size: 12px;"><br>
</div>
<div style="font-family: Helvetica; font-size: 12px;">Additionally, the IEEE standard specifies that minNum/maxNum return the “canonicalized” value. If the returned value is a NaN, my understanding of this is that the payload bits of the NaN are all 0 even
if this was not the case for the input NaNs. This also contradicts just returning the raw value of the other operand if one is a NaN as will happen in the implemented constant folding and libm lowering.</div>
<div style="font-family: Helvetica; font-size: 12px;"><br>
</div>
<div style="font-family: Helvetica; font-size: 12px;">On AMDGPU we select these to instructions that have either behavior, depending on a flag that can be considered part of the global floating point mode. We default to enabling the IEEE behavior, returning
a quieted nan. In order to match the expected behavior of fmin/fmax in the OpenCL builtin library, we use llvm.canonicalize to quiet the incoming NaNs to the minnum/maxnum intrinsics. This is approximately tripling the number of instructions inside the inner
loops of an important kernel, where min/max are feeding into each other. My goal is to eliminate the canonicalizes, since the output of min/max is supposed to be canonical. I don’t necessarily care about getting correct FP exception behavior, but I do need
these to return the correct value for signaling nans loaded from memory.</div>
<div style="font-family: Helvetica; font-size: 12px;"><br>
</div>
<div style="font-family: Helvetica; font-size: 12px;">Since these intrinsics do have the IEEE name, I think they should probably be change to match the IEEE behavior. This would mean that optimizing</div>
<div style="font-family: Helvetica; font-size: 12px;">llvm.canonicalize(llvm.minnum(x, y)) -> llvm.minnum(x, y) is a correct transformation. Target lowering would then be expected to insert quieting canonicalizes for the inputs to the libm fmin call. Do we
need another pair of intrinsics matching the fmin/fmax behavior?</div>
<div style="font-family: Helvetica; font-size: 12px;"><br>
</div>
<div style="font-family: Helvetica; font-size: 12px;">TLDR:</div>
<div style="font-family: Helvetica; font-size: 12px;">1. What do these do for signaling NaNs?</div>
<div style="font-family: Helvetica; font-size: 12px;">2. If the target expects something to happen during llvm.canonicalize (e.g. flush denormals), can this be assumed to have been done by the implementation of llvm.minnum/maxnum?</div>
<div style="font-family: Helvetica; font-size: 12px;"><br>
</div>
<div style="font-family: Helvetica; font-size: 12px;">-Matt</div>
</div>
</body>
</html>