[LLVMdev] [cfe-dev] computing a conservatively rounded square of a double

Geoffrey Irving irving at naml.us
Wed Mar 26 14:46:01 PDT 2014


On Wed, Mar 26, 2014 at 2:34 PM, Philip Reames
<listmail at philipreames.com> wrote:
>
> On 03/26/2014 11:36 AM, Geoffrey Irving wrote:
>
> I am trying to compute conservative lower and upper bounds for the
> square of a double.  I have set the rounding mode to FE_UPWARDS
> elsewhere, so the code is
>
> struct Interval {
>   double nlo, hi;
> };
>
> Interval inspect_singleton_sqr(const double x) {
>   Interval s;
>   s.nlo = x * -x;
>   s.hi = x * x;
>   return s;
> }
>
> Both multiplies are necessary, since they round in different
> directions.  However, clang does not know this, assumes that x * -x =
> -(x * x), and simplifies down to a single multiply:
>
> .LCPI1_0:
>   .quad -9223372036854775808    # double -0.000000e+00
>   .quad -9223372036854775808    # double -0.000000e+00
>   .text
>   .globl  _Z21inspect_singleton_sqrd
>   .align  16, 0x90
>   .type _Z21inspect_singleton_sqrd, at function
> _Z21inspect_singleton_sqrd:             # @_Z21inspect_singleton_sqrd
>   .cfi_startproc
> # BB#0:
>   vmulsd  %xmm0, %xmm0, %xmm1
>   vxorpd  .LCPI1_0(%rip), %xmm1, %xmm0
>   ret
> .Ltmp1:
>   .size _Z21inspect_singleton_sqrd, .Ltmp1-_Z21inspect_singleton_sqrd
>   .cfi_endproc
>
> I realize this is unsupported behavior, but it would be nice to still
> be able to use clang to do numerical computation.  Is there a way to
> convince clang to not pull the multiplication outside addition?  I
> would like to avoid interfering with other optimizations in the
> process: the problem is easy to solve by making a multiply function
> and marking it never inline, but that would be terrible for
> performance.  I am happy to write clang specific code that turns on
> only when __clang__ is defined.
>
> Thanks,
> Geoffrey
>
>
> Geoffrey - In the short term, if you marked this function as
> "__attribute__((optnone))", that might have the effect you desire.  See
> http://clang-developers.42468.n3.nabble.com/RFC-add-Function-Attribute-to-disable-optimization-td4032641.html
>
> As for the more general solution, I don't know.
>
> For the rest of this, I'm speaking only as an interested observer.  None of
> this is particularly thought through.
>
> We really should have a good solution to this.  I know we don't currently
> support floating point context and flags, but not being able to support
> standard idioms from numerical computation seems very problematic.
>
> Here's one proposal to get the conversation started - please don't take it
> too seriously.
>
> We could introduce an "fp_rounding_sensitive" annotation (flag?) on the
> instruction level which disables optimizations sensative to floating point
> rounding.  This could be parsed by clang as either a function attribute or
> statement attribute.
>
> Alternatively, we could introduce an "fp_rounding_mode(x)" annotation with
> the semantics of specifying the desired rounding mode.  The required work
> would be much the same.
>
> I'm picturing either of these being *really* course grained in their effect
> - at least to start.  Something like disable all FP optimizations within
> their context.
>
> A toy implementation could even map the presence of such a flag to "optnone"
> as a starting point.  We could improve incrementally over time.
>
> p.s. If we already have such a thing and I just don't remember, please feel
> free to point that out.  :)

An alternative way to think about fp_rounding_sensitive is related to
one of the possible solutions:

static inline double neg(const double x) {
  union { double x; uint64_t i; } u;
  u.x = x;
  u.i = u.i ^ (uint64_t(1)<<63);
  return u.x;
}

If I replace x * -x with x * neg(x), everything works since clang
doesn't incorrectly apply the identity.  Unfortunately, it produces
significantly worse code, which I've included below.  Instead of using
vxorps to do the high bit flip in an SSE register, it moves data into
(and then out of) an integer register and does the bit flip there.

In other words, my problem would be solved if my neg() function had
the same nice code generation as normal -x, but didn't participate in
the incorrect identities.  I'm not sure if this can be arranged today;
I tried

   asm ("vxorps %1, %2, %0" : "=f" (nx) : "fm" (x), "fm" (y));

but it started spitting internal compiler errors at me:

    ./geode/exact/Interval.h:167:12: error: illegal "f" output constraint
          asm ("vxorps %1, %2, %0" : "=f" (nx) : "fm" (x), "fm" (y));
               ^
    fatal error: error in backend: Access past stack top!

Geoffrey

-------------------------------------------------

  .globl  _Z11inspect_sqrN5geode8IntervalE
  .align  16, 0x90
  .type _Z11inspect_sqrN5geode8IntervalE, at function
_Z11inspect_sqrN5geode8IntervalE:       # @_Z11inspect_sqrN5geode8IntervalE
  .cfi_startproc
# BB#0:
  vxorps  %xmm2, %xmm2, %xmm2
  vucomisd  %xmm0, %xmm2
  jbe .LBB0_2
# BB#1:
  vmovd %xmm0, %rax
  movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000
  xorq  %rax, %rcx
  vmovd %rcx, %xmm2
  vmulsd  %xmm0, %xmm2, %xmm2
  vmulsd  %xmm1, %xmm1, %xmm1
  vmovaps %xmm2, %xmm0
  ret
.LBB0_2:
  vucomisd  %xmm1, %xmm2
  jbe .LBB0_4
# BB#3:
  vmovd %xmm1, %rax
  movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000
  xorq  %rax, %rcx
  vmovd %rcx, %xmm2
  vmulsd  %xmm1, %xmm2, %xmm2
  vmulsd  %xmm0, %xmm0, %xmm1
  vmovaps %xmm2, %xmm0
  ret
.LBB0_4:
  vmaxsd  %xmm0, %xmm1, %xmm0
  vmulsd  %xmm0, %xmm0, %xmm1
  vxorps  %xmm2, %xmm2, %xmm2
  vmovaps %xmm2, %xmm0
  ret
.Ltmp0:
  .size _Z11inspect_sqrN5geode8IntervalE,
.Ltmp0-_Z11inspect_sqrN5geode8IntervalE
  .cfi_endproc



More information about the llvm-dev mailing list