[PATCH] D145634: [X86] Support llvm.{min,max}imum.f{16,32,64}

Tue Apr 25 07:34:36 PDT 2023

skatkov added a comment.

In D145634#4295568 <https://reviews.llvm.org/D145634#4295568>, @e-kud wrote:

> In D145634#4294388 <https://reviews.llvm.org/D145634#4294388>, @skatkov wrote:
>
>> In D145634#4294383 <https://reviews.llvm.org/D145634#4294383>, @skatkov wrote:
>>
>>> In D145634#4294264 <https://reviews.llvm.org/D145634#4294264>, @e-kud wrote:
>>>
>>>> In D145634#4291019 <https://reviews.llvm.org/D145634#4291019>, @skatkov wrote:
>>>>
>>>>> I have a side question (not delaying landing this one)
>>>>>
>>>>> It looks like this change lowers vectorized form of intrinsic in a non-optimal way.
>>>>> So I wonder whether there are some plans to improve it as follow-up?
>>>>
>>>> Yes, for sure. I've tried to find a way to implement vectorized version for SSE, but nothing's come to my mind. We need at least SSE2 for `PCMPEQ{W,D}` because all fp comparison instructions treat `-0.0` and `0.0` as equal. It seems that pentium3 will suffer from not optimal fmaximum/fminimum...
>>>
>>> Why we cannot do something like this https://godbolt.org/z/Yxfn3jTj1 ?
>>> I mean at least for avx?
>>
>> BTW. my measurements on micro benchmarking (I know micro might cause that branch predictor works good), the version https://godbolt.org/z/rEj9GPfnY is the best one for scalar but it cannot be implemented in SelectionDAG as has a CFG.
>
> Both versions are incorrect. They don't work as expected in case of `(-0.0, 0.0)`, `(0.0, -0.0)` inputs. Because `comiss` and `max` treat negative and positive zeros as equal.

Take a look carefully. In case of equality we check the sign of the first value. Thus comparison using ucommis is ok.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145634/new/

https://reviews.llvm.org/D145634