<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">On Oct 19, 2014, at 9:36 PM, Hal Finkel <<a href="mailto:hfinkel@anl.gov">hfinkel@anl.gov</a>> wrote:<br><div><br class="Apple-interchange-newline"><blockquote type="cite"><div style="font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">----- Original Message -----<br><blockquote type="cite">From: "Steve Canon" <<a href="mailto:scanon@apple.com">scanon@apple.com</a>><br>To:<span class="Apple-converted-space"> </span><a href="mailto:chandlerc@gmail.com">chandlerc@gmail.com</a>,<span class="Apple-converted-space"> </span><a href="mailto:resistor@mac.com">resistor@mac.com</a>,<span class="Apple-converted-space"> </span><a href="mailto:hfinkel@anl.gov">hfinkel@anl.gov</a>,<span class="Apple-converted-space"> </span><a href="mailto:scanon@apple.com">scanon@apple.com</a><br>Cc:<span class="Apple-converted-space"> </span><a href="mailto:cfe-commits@cs.uiuc.edu">cfe-commits@cs.uiuc.edu</a><br>Sent: Friday, October 17, 2014 4:23:28 AM<br>Subject: Re: [PATCH] [complex] Teach the complex math IR gen to emit direct math and a NaN-test prior to the call to<br>the library function.<br><br>Apologies for delay in looking at this, I'm on vacation this week.<br><br>I don't love this approach because (a) it doesn't get us fully to<br>where we want to be in performance, and (b) it's going to trash the<br>floating-point flag state. The performance issue is that we still<br>have two comparisons and one or two branches for every complex op<br>outside of no-nans, and the flags issue is as follows:<br><br>The intention of IEEE-754 is that anything that is conceptually a<br>single "operation" should raise at most one of divide-by-zero,<br>invalid, overflow, or underflow. A complex multiplication<br>implemented with lazy checking may cause two of these to be raised:<br><br> (tiny, huge) * (tiny, huge) --> underflow + overflow<br> (0, huge) * (inf, huge) --> invalid + overflow, no flags<br></blockquote><br>Thinking about this, this can only matter if we actually permit access to the FP environment, which we currently don't. So, if we were to ever allow "#pragma STDC FENV_ACCESS on", then we'd want to disable this optimization. But for now this is irrelevant (at least from the C perspective). Is this right?</div></blockquote><br></div><div>That's basically correct, though it's a bit strong to say that we don't permit access to the FP environment. More accurately, we don't make the necessary ordering guarantees to support FENV_ACCESS, but we also don't (generally) deliberately trash the flag state, we allow it to be accessed, and the result of accessing should (generally) be correct up to sequencing issues. Getting the complex ops right is a necessary step to supporting FENV_ACCESS someday if we want to, and does confer *some* benefit now, even without FENV_ACCESS support.</div><div><br></div><div>Basically, I don't want things to get dramatically worse then they are. We shouldn't e.g. introduce type conversion sequences that get the flags wrong because "we don't support FENV_ACCESS". If we hold that line, then it at least will remain feasible for someone to implement FENV_ACCESS. If they also have to fix all the lowerings and all the libcalls, it starts to become pretty daunting.</div><div><br></div><div>– Steve</div></body></html>