[llvm-dev] [cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long

Thu Apr 20 14:57:45 PDT 2017

One of sane methods to tackle this problem of enabling without touching
default optimizer behavior might be filtering out passes or sub-passes that
may violate the needs of FENV_ACCESS. I don't know that pass or subpasses
are modular enough to do this yet, but if we can come up with something
like a table-driven approach to pick which set of passes to run, that would
at least be a start. I'd imagine a lot of the passes initially won't even
get to run (even the ones that are necessary to compilation).

-Kevin

On Thu, Apr 20, 2017 at 1:50 PM, Kaylor, Andrew <andrew.kaylor at intel.com>
wrote:

> > This seems like it was done for perf reason (mispredict).
> Conditional-to-cmov transformation should keep
>
> > from introducing additional observable side-effects, and it's clear that
> whatever did this did not account
>
> > for floating point exception.
>
>
>
> That’s a very reasonable statement, but I’m not sure it corresponds to the
> way we have typically approached this sort of problem.
>
>
>
> In particular, there has been a de facto standard practice (and it has
> even been openly stated and agreed upon once or twice) of assuming default
> rounding mode and ignoring FP exceptions in the default (and currently
> only) optimization path for FP-related instructions.  That is, clang/LLVM
> didn’t just not support FENV_ACCESS by indifference but rather we have made
> conscious decisions to allow transformations that violate the needs of
> FENV_ACCESS when doing so can improve the performance of generated code.
> Basically, we more or less pretend that floating point status bits don’t
> exist (at least before you get to the target-specific backend).
>
>
>
> You’ll find that the X86 backend doesn’t even model MXCSR at the moment.
> I tried to add it recently and it kind of blew up before I had even modeled
> it for anything other than LDMXCSR and STMCXSR.  We may want to address
> that at some point, but right now it just isn’t there.
>
>
>
> When we discussed how FENV_ACCESS support should be implemented, Chandler
> proposed that when restricted modes (whether FENV_ACCESS or any other front
> end-specific analogous behavior) were not being used the optimizer should
> be able to behave as described above and that nothing done to support
> restricted FP behavior should complicate or restrict the default optimizer
> behavior.  This was met with general agreement at the time.
>
>
>
> I mention all this as prologue to saying that while we should do something
> to get FPToUI lowered without incorrectly setting FP exception status bits,
> it isn’t necessarily what we want as the default behavior.
>
>
>
> I would also like to add that I personally am very pleased that you
> discovered this issue and have gotten as far as you have in the analysis of
> the problem.  I’m in the process of adding constrained versions of various
> FP intrinsics (I have a patch ready to be sent out today) and what I’ve
> done up to now has been to simply translate the constrained operations into
> their traditional representations somewhere in the ISel process.  I was
> aware that something would need to be done in the codegen space to continue
> protecting these operations, but I was kind of hoping that the actual
> instruction selection would be reasonably safe.  FWIW, FPToUI is not one of
> the parts my pending patch addresses.
>
>
>
> -Andy
>
>
>
> *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of *Flamedoge
> via llvm-dev
> *Sent:* Wednesday, April 19, 2017 10:14 AM
> *To:* Michael Clark <michaeljclark at mac.com>
> *Cc:* llvm-dev <llvm-dev at lists.llvm.org>
> *Subject:* Re: [llvm-dev] [cfe-dev] FE_INEXACT being set for an exact
> conversion from float to unsigned long long
>
>
>
> > Are we better off using branches instead of cmove to implement FP to
> unsigned i64?
>
>
>
> This seems like it was done for perf reason (mispredict).
> Conditional-to-cmov transformation should keep from introducing additional
> observable side-effects, and it's clear that whatever did this did not
> account for floating point exception.
>
>
>
> On Wed, Apr 19, 2017 at 10:01 AM, Michael Clark via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Changing the list from cfe-dev to llvm-dev
>
>
>
> On 20 Apr 2017, at 4:52 AM, Michael Clark <michaeljclark at mac.com> wrote:
>
>
>
> I’m getting close. I think it may be an issue with an individual
> intrinsic. I’m looking for the X86 lowering of Instruction::FPToUI.
>
>
>
> I found a comment around the rationale for using a conditional move versus
> a branch. I believe the predicate logic using a conditional move is causing
> INEXACT to be set from the other side of the predicate as the lowered
> x86_64 code executes both conversions whereas GCC uses a branch. That seems
> to be the difference.
>
>
>
> I can’t find FPToUI in llvm/lib/Target/X86 so I’m trying to figure out
> what the cast gets renamed to in the target layer so I can find where the
> sequence is emitted.
>
>
>
>
>
> $ more llvm/lib/Target/X86//README-X86-64.txt
>
> …
>
> Are we better off using branches instead of cmove to implement FP to
> unsigned i64?
>
> _conv:
>         ucomiss LC0(%rip), %xmm0
>         cvttss2siq      %xmm0, %rdx
>         jb      L3
>         subss   LC0(%rip), %xmm0
>         movabsq $-9223372036854775808, %rax
>         cvttss2siq      %xmm0, %rdx
>         xorq    %rax, %rdx
> L3:
>         movq    %rdx, %rax
>         ret
>
> instead of
>
> _conv:
>         movss LCPI1_0(%rip), %xmm1
>         cvttss2siq %xmm0, %rcx
>         movaps %xmm0, %xmm2
>         subss %xmm1, %xmm2
>         cvttss2siq %xmm2, %rax
>         movabsq $-9223372036854775808, %rdx
>         xorq %rdx, %rax
>         ucomiss %xmm1, %xmm0
>         cmovb %rcx, %rax
>         ret
>
>
>
> On 19 Apr 2017, at 2:10 PM, Michael Clark <michaeljclark at mac.com> wrote:
>
>
>
>
>
> On 19 Apr 2017, at 1:14 PM, Tim Northover <t.p.northover at gmail.com> wrote:
>
>
>
> On 18 April 2017 at 15:54, Michael Clark via cfe-dev
> <cfe-dev at lists.llvm.org> wrote:
>
> The only way towards completing a milestone is via fixing a number of
> small issues along
> the way…
>
>
> I believe there's more to it than that. None of LLVM's optimizations
> are aware of this extra side-channel of information (with possible
> exceptions like avoiding speculating fdiv because of unavoidable
> exceptions).
>
> From what I remember, the real proposal is to replace all
> floating-point IR with intrinsics when FENV_ACCESS is on, which the
> optimizers by default won't have a clue about and will treat
> conservatively (essentially like they're modifying external memory).
>
> So be careful with drawing conclusions from small snippets; you're
> probably not seeing the full range of LLVM's behaviour.
>
>
>
>
>
> Yes. I’m sure.
>
>
>
> It reproduces with just the cast on its own: https://godbolt.org/g/myUoL2
>
>
>
> It appears to be in the LLVM lowering of the fptoui intrinsic so it must
> MC layer optimisations.
>
>
>
> ; Function Attrs: noinline nounwind uwtable
>
> define i64 @_Z7fcvt_luf(float %f) #0 {
>
>   %1 = alloca float, align 4
>
>   store float %f, float* %1, align 4
>
>   %2 = load float, float* %1, align 4
>
>   %3 = fptoui float %2 to i64
>
>   ret i64 %3
>
> }
>
>
>
> GCC performs a comparison with ucomiss and branches whereas Clang computes
> both forms and predicates the result using a conditional move. One of the
> conversions obviously is setting the INEXACT MXCSR flag.
>
>
>
> Clang lowering (inexact set when result is exact):
>
>
>
> fcvt_lu(float):
>         movss   xmm1, dword ptr [rip + .LCPI1_0] # xmm1 =
> mem[0],zero,zero,zero
>         movaps  xmm2, xmm0
>         subss   xmm2, xmm1
>         cvttss2si       rax, xmm2
>         movabs  rcx, -9223372036854775808
>         xor     rcx, rax
>         cvttss2si       rax, xmm0
>         ucomiss xmm0, xmm1
>         cmovae  rax, rcx
>         ret
>
>
>
> GCC lowering (sets flags correctly):
>
>
>
> fcvt_lu(float):
>         ucomiss xmm0, DWORD PTR .LC0[rip]
>         jnb     .L4
>         cvttss2si       rax, xmm0
>         ret
> .L4:
>         subss   xmm0, DWORD PTR .LC0[rip]
>         movabs  rdx, -9223372036854775808
>         cvttss2si       rax, xmm0
>         xor     rax, rdx
>         ret
>
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170420/c657ac26/attachment.html>