[llvm-dev] FENV_ACCESS and floating point LibFunc calls

Thu May 11 16:36:37 PDT 2017

Thanks, Andy. I'm not sure how to solve that or my case given the DAG's
basic-block limit. Probably CodeGenPrepare or SelectionDAGBuilder...or we
wait until after isel and try to split it up in a machine instruction pass.

I filed my example here:
https://bugs.llvm.org/show_bug.cgi?id=33013

Feel free to comment there and/or open a new bug for the FP_TO_UINT case.

On Thu, May 11, 2017 at 4:20 PM, Kaylor, Andrew <andrew.kaylor at intel.com>
wrote:

> Hi Sanjay,
>
>
>
> The issue that Michael is talking about is that in SelectionDAGLegalize::ExpandNode()
> we’re doing this:
>
>
>
>   case ISD::FP_TO_UINT: {
>
>     SDValue True, False;
>
>     EVT VT =  Node->getOperand(0).getValueType();
>
>     EVT NVT = Node->getValueType(0);
>
>     APFloat apf(DAG.EVTToAPFloatSemantics(VT),
>
>                 APInt::getNullValue(VT.getSizeInBits()));
>
>     APInt x = APInt::getSignMask(NVT.getSizeInBits());
>
>     (void)apf.convertFromAPInt(x, false, APFloat::rmNearestTiesToEven);
>
>     Tmp1 = DAG.getConstantFP(apf, dl, VT);
>
>     Tmp2 = DAG.getSetCC(dl, getSetCCResultType(VT),
>
>                         Node->getOperand(0),
>
>                         Tmp1, ISD::SETLT);
>
>     True = DAG.getNode(ISD::FP_TO_SINT, dl, NVT, Node->getOperand(0));
>
>     // TODO: Should any fast-math-flags be set for the FSUB?
>
>     False = DAG.getNode(ISD::FP_TO_SINT, dl, NVT,
>
>                         DAG.getNode(ISD::FSUB, dl, VT,
>
>                                     Node->getOperand(0), Tmp1));
>
>     False = DAG.getNode(ISD::XOR, dl, NVT, False,
>
>                         DAG.getConstant(x, dl, NVT));
>
>     Tmp1 = DAG.getSelect(dl, NVT, Tmp2, True, False);
>
>     Results.push_back(Tmp1);
>
>     break;
>
>   }
>
>
>
> This results in code that speculatively executes the FSUB and then uses a
> cmov instruction (based on whether or not the original value was less than
> LLONG_MAX) to select between the direct FP_TO_SINT result and the result of
> the FSUB.  If we assume that for most data sets a significant majority of
> values being converted this way will be less than LLONG_MAX then the
> processor should be able to predict a branch here pretty reliably and so
> branching code would be faster than code that does a speculative FSUB
> followed by a cmov.
>
>
>
> The question is, how do we need to change this to get the back end to
> generate a branch instead of a cmov.  I’m guessing that there are
> circumstances in which the target-specific backend would make that decision
> on its own and sink the FSUB, but for the X86 backend that doesn’t seem to
> be happening in this case.  My inclination was to just rewrite the
> ExpandNode() code above to explicitly create a branch, but I haven’t tried
> it yet.
>
>
>
> As a side effect, which is what originally drew Michael’s attention to
> this conversion, the INEXACT floating point flag is set even for exact
> conversions as a result of the speculative FSUB.  That’s how this got
> related to FENV_ACCESS support.
>
>
>
> I don’t know if there has been a bug opened for this issue.  There was an
> earlier discussion here: https://groups.google.com/
> forum/#!topic/llvm-dev/Tl9SD-edUJI
>
>
>
> -Andy
>
>
>
>
>
> *From:* Sanjay Patel [mailto:spatel at rotateright.com]
> *Sent:* Thursday, May 11, 2017 2:06 PM
> *To:* Kaylor, Andrew <andrew.kaylor at intel.com>
> *Cc:* Michael Clark <michaeljclark at mac.com>; llvm-dev <
> llvm-dev at lists.llvm.org>
> *Subject:* Re: [llvm-dev] FENV_ACCESS and floating point LibFunc calls
>
>
>
> Sounds like the select lowering issue is definitely separate from the FENV
> work.
>
> Is there a bug report with a C or IR example? You want to generate compare
> and branch instead of a cmov for something like this?
>
>
> int foo(float x) {
>   if (x < 42.0f)
>     return x;
>   return 12;
> }
>
> define i32 @foo(float %x) {
>   %cmp = fcmp olt float %x, 4.200000e+01
>   %conv = fptosi float %x to i32
>   %ret = select i1 %cmp, i32 %conv, i32 12
>   ret i32 %ret
> }
>
> $ clang -O2 cmovfp.c -S -o -
>     movss    LCPI0_0(%rip), %xmm1    ## xmm1 = mem[0],zero,zero,zero
>     ucomiss    %xmm0, %xmm1
>     cvttss2si    %xmm0, %ecx
>     movl    $12, %eax
>     cmoval    %ecx, %eax
>     retq
>
>
>
> On Thu, May 11, 2017 at 1:28 PM, Kaylor, Andrew via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Hi Michael,
>
>
>
> To be honest I haven’t started working on FP to integer conversions for
> FENV_ACCESS yet.
>
>
>
> To some extent I would consider the issue that you found independent of
> what I’m doing to constrain FP behavior.  That is, I think we ought to make
> the change you’re asking for even apart from the FENV_ACCESS work.  When I
> get to the conversions for FENV_ACCESS support it may require some
> additional constraints, but I think if the branching conversion is usually
> faster (and it looks like it will be) then that should be the default
> behavior.
>
>
>
> I’ll try to look into that.  I’d offer to give you advice on putting
> together a patch, but I’m still learning my way around the ISel code
> myself.  I think I know enough to figure out what to do but not enough to
> tell someone else how to do it without a bunch of wrong turns.
>
>
>
> -Andy
>
>
>
>
>
> *From:* Michael Clark [mailto:michaeljclark at mac.com]
> *Sent:* Wednesday, May 10, 2017 7:59 PM
> *To:* Kaylor, Andrew <andrew.kaylor at intel.com>
> *Cc:* llvm-dev <llvm-dev at lists.llvm.org>
> *Subject:* FENV_ACCESS and floating point LibFunc calls
>
>
>
> Hi Andy,
>
>
>
> I’m interested to try out your patches…
>
>
>
> I understand the scope of FENV_ACCESS is relatively wide, however I’m
> still curious if you managed to figure out how to prevent the
> SelectionDAGLegalize::ExpandNode() FP_TO_UINT lowering of the FPToUI
> intrinsic from producing the predicate logic that incorrectly sets the
> floating point accrued exceptions due to unconditional execution of the
> ISD::FSUB node attached to the SELECT node. It’s a little above my head to
> try to solve this issue with my current understanding of LLVM but I could
> give it a try. I’d need some guidance as to how the lowering of SELECT can
> be controlled. i.e. where LLVM decides whether and how to lower a select
> node as a branch vs predicate logic.
>
>
>
> I’d almost forgotten that we microbenchmarked this and found the branching
> version is faster with regular input (< LLONG_MAX).
>
>
>
> - https://godbolt.org/g/ytgk7l
>
> All, Where does LLVM decide to lower select as predicate logic vs branches
> and how does the cost model work? I’m curious about a tuning flag to
> generate branches instead of computing both values and using conditional
> moves…
>
>
>
> Best,
>
> Michael.
>
>
>
> On 11 May 2017, at 11:41 AM, via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>
>
>
> Hi all,
>
> Background
> I've been working on adding the necessary support to LLVM for clang to be
> able to support the STDC FENV_ACCESS pragma, which basically allows users
> to modify rounding mode at runtime and depend on the value of
> floating-point status flags or to unmask floating point exceptions without
> unexpected side effects.  I've committed an initial patch (r293226) that
> adds constrained intrinsics for the basic FP operations, and I have a patch
> up for review now (https://reviews.llvm.org/D32319) that adds constrained
> versions of a number of libm-like FP intrinsics.
>
> Current problem
> Now I'm trying to make sure I have a good solution for the way in which
> the optimizer handles recognized calls to libm functions (sqrt, pow, cos,
> sin, etc.).  Basically, I need to prevent all passes from making any
> modifications to these calls that would make assumptions about rounding
> mode or improperly affect the FP status flags (either suppressing flags
> that should be present or setting flags that should not be set).  For
> instance, there are circumstances in which the optimizer will constant fold
> a call to one of these functions if the value of the arguments are known at
> compile time, but this constant folding generally assumes the default
> rounding mode and if the library call would have set a status flag, I need
> the flag to be set.
>
> Question
> My question is, can/should I just rely on the front end setting the
> "nobuiltin" attribute for the call site in any location where the FP
> behavior needs to be restricted?
>
> Ideally, I would like to be able to conditionally enable optimizations
> like constant folding if I am able to prove that the rounding mode, though
> dynamic, is known for the callsite at compile time (the constrained
> intrinsics have a mechanism to enable this), but at the moment I am more
> concerned about correctness and would be willing to sacrifice optimizations
> to get correct behavior.  Long term, I was thinking that maybe I could do
> something like attach metadata to indicate rounding mode and exception
> behavior when they were known.
>
> Is there a better way to do this?
>
> Thanks,
> Andy
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170511/c42a61a1/attachment.html>