[PATCH] D45616: [X86] Lower _mm[256|512]_cmp[.]_mask intrinsics to native llvm IR

Wed May 9 07:05:37 PDT 2018

GBuella added a comment.

In https://reviews.llvm.org/D45616#1067492, @efriedma wrote:

> > The fcmp opcode has no defined behavior with NaN operands in the comparisions handled in this patch.
>
> Could you describe the problem here in a bit more detail?  As far as I know, the LLVM IR fcmp should return the same result as the x86 CMPPD, even without fast-math.

So, I'm still looking into this.
What I see is, yes, fcmp just so happens to work the same as x86 CMPPD.
An example:

  fcmp olt <2 x double> %x, %y

becomes vcmpltpd.

But this only holds for condition codes 0 - 7.

Where LLVM IR has a condition "olt" <- ordered less-than, x86 cmppd has two corresponding condition codes: 0x01->"less-than (ordered, signaling)", which is "vcmpltpd" and 0x11->"less-than (ordered, nonsignaling)" which is  "vcmplt_oqps"

Now, if the builtin's CC argument is 1 (which refers to vcmpltps), we lower it to "fcmp olt", which then results in "vcmpltps", we are ok, yes.
But in the IR, there is no information about the user expecting "vcmpltps" vs "vcmplt_oqps".

Do I understand these tricks right?
If we are ok with this (hard to understand) approach, I can just lower these without fast-math as well, as long as CC < 8, by modifying this condition:

  if (CC < 8 && !UsesNonDefaultRounding && getLangOpts().FastMath) {

Although, I'm still looked at what happens with sNaN, and with qNaN constants, once these comparisons are lowered to fcmp.

Repository:
  rC Clang

https://reviews.llvm.org/D45616