[PATCH] [complex] Teach the complex math IR gen to emit direct math and a NaN-test prior to the call to the library function.

Stephen Canon scanon at apple.com
Fri Oct 17 23:41:02 PDT 2014


On Oct 18, 2014, at 3:04 AM, Chandler Carruth <chandlerc at gmail.com> wrote:

> On Fri, Oct 17, 2014 at 2:23 AM, Steve Canon <scanon at apple.com> wrote:
> Apologies for delay in looking at this, I'm on vacation this week.
> 
> Not a problem. =] 
> 
> I don't love this approach because (a) it doesn't get us fully to where we want to be in performance, and (b) it's going to trash the floating-point flag state.  The performance issue is that we still have two comparisons and one or two branches for every complex op outside of no-nans, and the flags issue is as follows:
> 
> The intention of IEEE-754 is that anything that is conceptually a single "operation" should raise at most one of divide-by-zero, invalid, overflow, or underflow.  A complex multiplication implemented with lazy checking may cause two of these to be raised:
> 
>     (tiny, huge) * (tiny, huge) --> underflow + overflow
>     (0, huge) * (inf, huge) --> invalid + overflow, no flags
> 
> My preferred approach would be to implement limited-range semantics as an option (via either pragma or flag), and have it implied by fast-math.
> 
> I don't really understand what you want here.
> 
> In the case of fast-math, the comparisons should vanish and I think we're left with a minimal amount of math. If there is some more minimal way to compute the result in the case of fast-math, please let me know?

I agree; what you have is perfectly fine for fast-math, and should generate fast code.

> In the case of *not* have fast-math and needing to be correct, I'm just not in a position to come up with a more efficient but still numerically correct implementation. I have no idea how to do it. And I'm not really willing to sign up to do it because I don't have the time. =/ I don't think that hoping for a future better world should obstruct getting this into the tree as it (to the extent I'm aware) is a strict improvement on the status quo.

What I'm saying is that in the long-term, we'd like to support two modes for these operations:

limited-range: In this mode, we use the simple "usual" mathematical formulations for multiplication and division (no careful handling of overflow or underflow or invalid cases).  This is like finite-math restricted to complex arithmetic expressions (in particular, we don't want to require users enable finite-math to get this behavior; we may want this behavior to be the default).

no-limited-range: We unconditionally call to compiler-rt for complex mul and div operations, and make the compiler-rt implementations correct w.r.t. flags.

The current state of affairs is similar to supporting only no-limited-range, except that the compiler-rt implementations may need to be fixed up (I'm happy to do that work).  This patch puts us somewhere in between the two modes, which is a better place for most users, but still slightly worse than where I'd really like to be headed.  My only real concern is of building up too much machinery that needs to be undone to get to the "really right" place.

I'm not *so* concerned with this patch in particular.  My comments are more of an effort to establish a record of where we'd like to be going with this stuff for future reference.  LGTM.

– Steve

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20141018/0cdf8c20/attachment.html>


More information about the cfe-commits mailing list