[PATCH] D34769: [X86] X86::CMOV to Branch heuristic based optimization

Sun Jul 9 06:15:08 PDT 2017

michaeljclark added a comment.

In https://reviews.llvm.org/D34769#797714, @spatel wrote:

> I haven't looked at the patch yet, but for reference I filed:
>  https://bugs.llvm.org//show_bug.cgi?id=33013 (although the comments veered off to a different topic)
>  ...and mentioned:
>  https://reviews.llvm.org/rL292154
>
> If the example(s) in the bug report are already here, then great. If not, you might want to consider those cases.

The reason it veered off topic is that the FPToUI bug that I reported on the mailing list, was dismissed supposedly because LLVM doesn't support correct floating point accrued exception state for very simple intrinsic conversions e.g. ISD::FP_TO_UINT is broken on X86. The issue was isolated to a SELECT created by FPToUI being lowered as a conditional move, however, one side of the SELECT had a SUBSS floating point instruction that clobbers the floating point accrued exception state (sets FE_INEXECT), so we discussed exact vs inexact conversions. The fix for the FPToUI bug is to lower the SELECT as a branch, however the compiler needs to model the MXCSR flags. That aside, FPToUI is trivially fixable by emitting a branch. It's a bug.

The bug reported was "FE_INEXACT being set for an exact conversion from float to unsigned long long" and it resulted in you raising a bug about the compiler generating more branchy code (ignoring the issue I reported), so we were not, in fact, going off topic, rather we were discussing the original bug reported on the mailing list.

This code may indeed fix the FE_INEXACT issue but long term the compiler needs to consider the floating point accrued flags side effects, however, whether of not FENV_ACCESS is enabled, the FPToUI should be trivially able to generate correct flags by using a branch instead of a cmove. GCC preserves floating point accrued exception flags for simple signed and unsigned floating point conversions when FENV_ACCESS is not enabled. It's not a complex DAG. This is not especially complex FP code. It's simply a broken ISD::FP_TO_UINT. i.e. buggy intrinsic.

"SelectionDAGLegalize::ExpandNode() in LegalizeDAG.cpp.  The X86 backend sets a table entry indicating that FP_TO_UINT should be expanded for these value types, but the actual expansion is in target-independent code. " from Andrew K. It's a SELECT.

  case ISD::FP_TO_UINT: {
    SDValue True, False;
    EVT VT =  Node->getOperand(0).getValueType();
    EVT NVT = Node->getValueType(0);
    APFloat apf(DAG.EVTToAPFloatSemantics(VT),
                APInt::getNullValue(VT.getSizeInBits()));
    APInt x = APInt::getSignBit(NVT.getSizeInBits());
    (void)apf.convertFromAPInt(x, false, APFloat::rmNearestTiesToEven);
    Tmp1 = DAG.getConstantFP(apf, dl, VT);
    Tmp2 = DAG.getSetCC(dl, getSetCCResultType(VT),
                        Node->getOperand(0),
                        Tmp1, ISD::SETLT);
    True = DAG.getNode(ISD::FP_TO_SINT, dl, NVT, Node->getOperand(0));
    // TODO: Should any fast-math-flags be set for the FSUB?
    False = DAG.getNode(ISD::FP_TO_SINT, dl, NVT,
                        DAG.getNode(ISD::FSUB, dl, VT,
                                    Node->getOperand(0), Tmp1));
    False = DAG.getNode(ISD::XOR, dl, NVT, False,
                        DAG.getConstant(x, dl, NVT));
    Tmp1 = DAG.getSelect(dl, NVT, Tmp2, True, False);
    Results.push_back(Tmp1);
    break;
  }

Note the second form with CMOV has a bug as the SUBSS is clobbering the flags resulting in FE_INEXACT for exact conversions. In any case this patch may fix the issue in a round about way

"
$ more llvm/lib/Target/X86//README-X86-64.txt
…
Are we better off using branches instead of cmove to implement FP to
unsigned i64?

_conv:

  ucomiss LC0(%rip), %xmm0
  cvttss2siq      %xmm0, %rdx
  jb      L3
  subss   LC0(%rip), %xmm0
  movabsq $-9223372036854775808, %rax
  cvttss2siq      %xmm0, %rdx
  xorq    %rax, %rdx

L3:

  movq    %rdx, %rax
  ret

instead of

_conv:

  movss LCPI1_0(%rip), %xmm1
  cvttss2siq %xmm0, %rcx
  movaps %xmm0, %xmm2
  subss %xmm1, %xmm2
  cvttss2siq %xmm2, %rax
  movabsq $-9223372036854775808, %rdx
  xorq %rdx, %rax
  ucomiss %xmm1, %xmm0
  cmovb %rcx, %rax
  ret

"

https://reviews.llvm.org/D34769