[cfe-dev] Complex arithmetic ignores -ffast-math after clang r219557, serious performance regressions

Tue Jul 21 20:01:06 PDT 2015

Hi Richard,

I think you're seeing a change in the backend (in combination with an older frontend change) -- perhaps the backend change is when we added fast-math flags to fcmp?

Here's what happens:

Given some code like this:
$ cat /tmp/c.c
typedef _Complex double dc;

dc foo(dc a, dc b) {
  return a*b;
}

Compiling it produces IR that looks like this:

define { double, double } @foo(double %a.coerce0, double %a.coerce1, double %b.coerce0, double %b.coerce1) #0 {
entry:
 ...
 [perform the fast code]
  %isnan_cmp = fcmp fast uno double %mul_r, %mul_r
  br i1 %isnan_cmp, label %complex_mul_imag_nan, label %complex_mul_cont, !prof !1

complex_mul_imag_nan:                             ; preds = %entry
  %isnan_cmp1 = fcmp fast uno double %mul_i, %mul_i
  br i1 %isnan_cmp1, label %complex_mul_libcall, label %complex_mul_cont, !prof !1

complex_mul_libcall:                              ; preds = %complex_mul_imag_nan
  %call = call { double, double } @__muldc3(double %a.real, double %a.imag, double %b.real, double %b.imag) #1
  %4 = extractvalue { double, double } %call, 0
  %5 = extractvalue { double, double } %call, 1
  br label %complex_mul_cont

complex_mul_cont:                                 ; preds = %complex_mul_libcall, %complex_mul_imag_nan, 
  ...

So we always do the fast calculation, and then only if we get NaN, do we go back and do the slow calculation. But because of the fast-math flags, the backend can constant fold the relevant comparisons, and eliminate that entire set of branches, leaving on the fast code.

 -Hal

----- Original Message -----
> From: "Reid Kleckner" <rnk at google.com>
> To: "Richard Campbell" <rlcamp.pdx at gmail.com>
> Cc: "Hal Finkel" <hfinkel at anl.gov>, "Clang Developers List" <cfe-dev at cs.uiuc.edu>
> Sent: Monday, July 20, 2015 1:53:42 PM
> Subject: Re: [cfe-dev] Complex arithmetic ignores -ffast-math after clang r219557, serious performance regressions
> 
> 
> Not that I'm aware of. Did anything happen here?
> 
> 
> On Thu, Jul 16, 2015 at 5:26 PM, Richard Campbell <
> rlcamp.pdx at gmail.com > wrote:
> 
> 
> Hal,
> 
> SVN now seems to be respecting the -ffast-math flag in the way we
> desire without Matthijs’ temporary fix. I didn’t see any further
> traffic about this on the cfe-dev list - was there a discussion
> elsewhere? Did it get fixed by accident as part of some other
> change, and we should worry about whether it will come up again?
> 
> Richard
> 
> > On Jul 2, 2015, at 2:45 PM, Hal Finkel < hfinkel at anl.gov > wrote:
> > 
> > Hi Richard,
> > 
> > Thanks for bringing this to my attention.
> > 
> > -Hal
> > 
> > ----- Original Message -----
> >> From: "Richard Campbell" < rlcamp.pdx at gmail.com >
> >> To: hfinkel at anl.gov
> >> Sent: Wednesday, July 1, 2015 12:13:50 PM
> >> Subject: Fwd: Complex arithmetic ignores -ffast-math after clang
> >> r219557, serious performance regressions
> >> 
> >> Hal,
> >> 
> >> 
> >> I posted this in the cfe-dev mailing list about a week ago and
> >> didn’t
> >> get any replies. Can you comment on this or recommend another more
> >> ideal place to discuss it?
> >> 
> >> 
> >> Richard Campbell
> >> 
> >> 
> >> 
> >> 
> >> Begin forwarded message:
> >> 
> >> From: Richard Campbell < rlcamp.pdx at gmail.com >
> >> 
> >> Subject: Complex arithmetic ignores -ffast-math after clang
> >> r219557,
> >> serious performance regressions
> >> 
> >> Date: June 25, 2015 at 11:54:10 AM PDT
> >> 
> >> To: cfe-dev at cs.uiuc.edu
> >> 
> >> 
> >> 
> >> 
> >> After building with clang 3.7svn recently, I saw a huge speed hit
> >> across much of our HPC and floating point DSP code. I looked at
> >> the
> >> asm output and it's riddled with calls to ___mulsc3, which is
> >> never
> >> inlined (preventing lots of other optimizations) and which
> >> includes
> >> a bunch of C99 Annex G-recommended branch conditions for range
> >> checks and whatnot. One of the purposes of -ffast-math has always
> >> been to disable these sort of checks, trusting the developer to
> >> ensure that they can't happen or will be handled upstream.
> >> 
> >> 
> >> Explicitly writing out the real and imaginary component math in
> >> one
> >> of my critical sections was enough to confirm that the problem
> >> lies
> >> here and not elsewhere. However, doing this throughout all of our
> >> code would be prohibitive, and of course greatly reduces the
> >> readability of the code and presumably the ability for future
> >> compilers to optimize it in a way that I haven’t though of yet.
> >> 
> >> 
> >> The relevant patch discussion in the mailing list is here:
> >> http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20141006/116248.html
> >> and includes a comment from hfinkel also requesting that the
> >> libcalls be skipped in fast-math mode. From what I can see there
> >> was
> >> no followup on this.
> >> 
> >> 
> >> At the bare minimum I think these checks should be disabled within
> >> mulsc3 when ffast-math or the relevant subflag is enabled, and
> >> preferably that the library calls be skipped entirely as before,
> >> so
> >> that other compiler optimizations aren't prevented.
> >> 
> > 
> > --
> > Hal Finkel
> > Assistant Computational Scientist
> > Leadership Computing Facility
> > Argonne National Laboratory
> 
> 
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
> 
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory