[llvm-dev] Proposed new min and max intrinsics

Mon Nov 12 12:24:09 PST 2018

One more idea: (if this is getting spammy, tell me and I'll stop!)

If %a and %b are ordered equal, return %a | %b (gives -0.0 if either is 
-0.0; for maximum, it would be %a & %b, only returning -0.0 if both are 
-0.0). The only two distinct bit patterns that compare ordered equal 
should be -0 and +0, which are identical everywhere except for the sign bit.

Gets a bit bit-casty but not that bad:

   ;; Work out result for when %a == %b (ordered)
   %a_bits = bitcast float %a to i32
   %b_bits = bitcast float %b to i32
   %a_bitor_b_bits = or i32 %a_bits, %b_bits
   %a_bitor_b = bitcast i32 %a_bitor_b_bits to float

   ;; Find minimum if not both zero (from above)
   %no_nan_and_a_lesser = fcmp olt %a, %b
   %lesser_or_b_if_nan = select %no_nan_and_a_lesser, %a, %b
   %a_not_nan = fcmp ord %a, %a
   %minimum_not_zeros = select %a_not_nan, %lesser_or_b_if_nan, %a

   ;; Handle +-0
   %a_and_b_ordered_equal = fcmp oeq %a, %b
   %final_minimum = select %a_and_b_ordered_equal, %a_bitor_b, 
%minimum_not_zeros

For x86-64, we can do a bit better by combining the OR with the final 
select, which I haven't tried to express in LLVM IR:

   ; input: %a in xmm0, %b in xmm1
   movaps   xmm2, xmm0
   cmpeqss  xmm2, xmm1 ; %a_or_b_ordered_equal
   andps    xmm2, xmm0 ; %a_or_b_ordered_equal & %a

   movaps   xmm3, xmm0
   minss    xmm3, xmm1 ; %lesser_or_b_if_nan
   movaps   xmm4, xmm0
   cmpordss xmm0, xmm0 ; %a_not_nan
   blendvps xmm3, xmm4 ; %minimum_not_zeros
   ; (blendvps turns into andps / andnps / orps pre-SSE4.1)

   orps     xmm2, xmm3 ; %final_minimum

(if %a_or_b_ordered_equal, then the result of %minimum_not_zeros is 
guaranteed to be %b, hence we only need to conditionally OR in %a in 
that case.)

For the maximum case, you would use "cmpneqss", "orps xmm2, xmm0", then 
"andps xmm2, xmm3" for the final combine.

-Fabian

On 11/9/2018 2:22 PM, Fabian Giesen wrote:
> Ah, forget about the +-0 issue!
> 
> Slightly different way to handle them: XOR the two arguments, AND to 
> keep only the sign bit, then OR it back into the result at the end (for 
> minimum; maximum would do a bit clear/and-not).
> 
> The argument goes as follows:
> 1) If the two sign bits agree, this does nothing
> 2) If the two sign bits disagree:
>    2.1) If the comparison was unordered, we already return a NaN; this 
> makes us return a NaN with sign bit set. (But still a NaN. I don't see 
> anything in the rules for minimum/maximum guaranteeing a specific NaN.)
>    2.2) If the comparison was ordered and not between two 0s, the value 
> with the sign bit set is the smaller one, so the OR ends up doing nothing.
>    2.3) If the comparison was ordered and between two 0s, the OR 
> guarantees we return a negative 0.
> 
> If you write this out with the required bitcasts it probably still gets 
> lengthy in LLVM IR form (so might still want to put it in compiler-rt), 
> but this seems like a preferable implementation provided the target lets 
> you do the required bitwise ops on FP registers.
> 
> -Fabian
> 
> On 11/9/18 10:56 AM, Thomas Lively wrote:
> 
>> I would agree, but the expansion also has to properly treat negative 
>> zero as less than zero, which leads to something like the following:
>>
>> ;; Find minimum if both zero
>> %a_sign = fgetsign %a
>> %b_sign = fgetsign %b
>> %a_is_lesser_zero = icmp ugt %a_sign, %b_sign
>> %minimum_zeros = select %a_is_lesser_zero, %a, %b
>>
>> ;; Find minimum if not both zero (from above)
>> %no_nan_and_a_lesser = fcmp olt %a, %b
>> %lesser_or_b_if_nan = select %no_nan_and_a_lesser, %a, %b
>> %a_not_nan = fcmp ord %a, %a
>> %minimum_not_zeros = select %a_not_nan, %lesser_or_b_if_nan, %a
>>
>> ;; Choose between zeros and not-zeros
>> %a_is_zero = fcmp oeq %a, 0
>> %b_is_zero = fcmp oeq %b, 0
>> %both_zero = and %a_is_zero, %b_is_zero
>> %minimum = select %both_zero, %minimum_zeros, %minimum_not_zeros
>>
>> Which is considerably less reasonable.
>>
>>
>>
>> On Fri, Nov 9, 2018 at 7:00 AM Cameron McInally via llvm-dev 
>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>
>>     On Thu, Nov 8, 2018 at 11:35 PM Fabian Giesen via llvm-dev
>>     <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>
>>         What is so complicated about these? Shouldn't they just
>>         correspond to
>>         two compares + selects?
>>
>>         To give a concrete example, x86 MIN[SP][SD] and MAX[SP][SD],
>>         respectively, correspond exactly to
>>
>>            MIN*: select(a < b, a, b) (i.e. "a < b ? a : b")
>>            MAX*: select(a > b, a, b) (i.e. "a > b ? a : b")
>>
>>         IIRC, MINIMUM and MAXIMUM have the added requirement that they
>>         should
>>         return NaN if _either_ input is NaN, whereas the above will
>>         return NaN
>>         if the second input (i.e. b) is NaN, but not if the first is.
>>
>>         So we need to explicitly catch the case where a is NaN as
>>         well. For
>>         minimum, that works out to something like:
>>
>>            %3 = fcmp olt float %a, %b
>>            %4 = select i1 %3, float %a, float %b ; (a < b) ? a : b
>>            %5 = fcmp ord float %a, %a ; true if !isNaN(a)
>>            %6 = select i1 %5, float %4, float %a ; if a was NaN, return a
>>
>>         for the entire operation. The logic here is that if isNaN(a) ||
>>         isNaN(b), the initial comparison will evaluate to false and %4
>>         ends up
>>         being b. If isNaN(b), this is a NaN value (as required). The
>>         case we are
>>         missing is when isNaN(a) && !isNaN(b), where %4 is not a NaN;
>>         the second
>>         compare + select fixes that one up.
>>
>>         The first pair of these corresponds to a single (x86-style)
>>         MIN/MAX, and
>>         the second turns into a compare followed by (depending on target
>>         instruction set) either a BLEND or some logic ops.
>>
>>         For minimumNumber/maximumNumber, you should be able to use a
>>         similar
>>         construction. Showing the example for minimumNumber here:
>>
>>            %3 = fcmp olt float %a, %b
>>            %4 = select i1 %3, float %a, float %b ; (a < b) ? a : b
>>            %5 = fcmp ord float %b, %b ; true if !isNaN(b)
>>            %6 = select i1 %5, float %4, float %a ; if b was NaN, return a
>>
>>         Starts out the same as before. Here, the tricky case is
>>         !isNaN(a) &&
>>         isNaN(b). The initial select %4 will result in the (NaN) b in
>>         that case,
>>         and the second compare/select pair switches the result to a
>>         instead; we
>>         will only get a NaN result if both inputs were NaN.
>>
>>         I might be missing something here, but that seems like a
>>         fairly harmless
>>         expansion, as such things go, and going to compiler-rt feels
>>         like overkill.
>>
>>
>>     +1
>>     _______________________________________________
>>     LLVM Developers mailing list
>>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>