[llvm-dev] Trouble when suppressing a portion of fast-math-transformations

Hal Finkel via llvm-dev llvm-dev at lists.llvm.org
Thu Sep 28 18:36:21 PDT 2017


Hi, Warren,

Thanks for writing all of this up. In short, regarding your suggested 
solution:

> 4. To fix this, I think that additional fast-math-flags are likely 
> needed in
>
> the IR.  Instead of the following set:
>
> 'nnan' + 'ninf' + 'nsz' + 'arcp' + 'contract'
>
> something like this:
>
> 'reassoc' + 'libm' + 'nnan' + 'ninf' + 'nsz' + 'arcp' + 'contract'
>
> would be more useful.  Related to this, the current 'fast' flag which acts
>
> as an umbrella (enabling 'nnan' + 'ninf' + 'nsz' + 'arcp' + 
> 'contract') may
>
> not be needed.  A discussion on this point was raised last November on the
>
> mailing list:
>
> http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html

I agree. I'm happy to help review the patches. It will be best to have 
only the finer-grained flags where there's no "fast" flag that implies 
all of the others.

  -Hal

On 09/28/2017 07:56 PM, Ristow, Warren via llvm-dev wrote:
>
> Hi all,
>
> In a mailing-list post last November:
>
> http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html
>
> I raised some concerns that having the IR-level fast-math-flag 'fast' 
> act as an
>
> "umbrella" to implicitly turn on all the lower-level fast-math-flags, 
> causes
>
> some fundamental problems.  Those fundamental problems are related to
>
> situations where a user wants to disable a portion of the fast-math 
> behavior.
>
> For example, to enable all the fast-math transformations except for the
>
> reciprocal-math transformation, a command like the following is what a 
> user
>
> would expect to work:
>
> clang++ -O2 -ffast-math -fno-reciprocal-math -c foo.cpp
>
> But that isn't what it's doing.
>
> I believe this is a serious problem, but I also want to avoid 
> over-stating the
>
> seriousness. To be explicit, the problems I'm describing here happen when
>
> '-ffast-math' is used with one or more of the underlying fast-math-related
>
> aspects _disabled_ (like the '-fno-reciprocal-math' example, above).
>
> Conversely, when '-ffast-math' is used "on its own", the situation is 
> fine.
>
> For terminology here, I'll refer to these underlying fast-math-related 
> aspects
>
> (like reciprocal-math, associative-math, math-errno, and others) as
>
> "sub-fast-math" aspects.
>
> I apologize for the length of this post.  I'm putting the summary up 
> front, so
>
> that anyone interested in fast-math issues can quickly get the 
> big-picture of
>
> the issues I'm describing here.
>
> In Summary:
>
> 1. With the change of r297837, the driver now more cleanly handles
>
> '-ffast-math', and other sub-fast-math switches (like
>
> '-f[no]-reciprocal-math', '-f[no-]math-errno', and others).
>
> 2. Prior to that change, the disabling of a sub-fast-math switch was often
>
> ineffective.  So as an example, the following two commands often resulted
>
> in the same code-gen, even if there were
>
> fast-math-reciprocal-transformations that were done:
>
> clang++ -O2 -ffast-math -c foo.cpp
>
> clang++ -O2 -ffast-math -fno-reciprocal-math -c foo.cpp
>
> 3. Since that change, the disabling of a sub-fast-math switch disables 
> many
>
> more sub-fast-math transformations than just the one specified.  So now,
>
> the following two commands often result in very similar (and sometimes
>
> identical) code-gen:
>
> clang++ -O2 -c foo.cpp
>
> clang++ -O2 -ffast-math -fno-reciprocal-math -c foo.cpp
>
> That is, disabling a single sub-fast-math transformation in some (many?)
>
> cases now ends up disabling almost all the fast-math transformations.
>
> This causes a performance hit for people that have been doing this.
>
> 4. To fix this, I think that additional fast-math-flags are likely 
> needed in
>
> the IR.  Instead of the following set:
>
> 'nnan' + 'ninf' + 'nsz' + 'arcp' + 'contract'
>
> something like this:
>
> 'reassoc' + 'libm' + 'nnan' + 'ninf' + 'nsz' + 'arcp' + 'contract'
>
> would be more useful.  Related to this, the current 'fast' flag which acts
>
> as an umbrella (enabling 'nnan' + 'ninf' + 'nsz' + 'arcp' + 
> 'contract') may
>
> not be needed.  A discussion on this point was raised last November on the
>
> mailing list:
>
> http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html
>
> TL;DR
>
> More details are in that thread from November, but the problem in its 
> entirety
>
> involved both back-end LLVM issues, and front-end Clang (driver) 
> issues.  The
>
> LLVM issues are related to the umbrella aspect of 'fast', along with other
>
> fast-math-flags implementation details (described below).  The front-end
>
> aspects in Clang are related to the driver's handling of '-ffast-math' 
> (which
>
> also had an "umbrella" aspect).  The driver code has been refactored 
> since that
>
> November post, fixing the umbrella aspect of the front-end.  But I 
> never got
>
> around to working on the related back-end issues (nor has anyone 
> else), and the
>
> refactored front-end now results in the back-end issues manifesting
>
> differently, and arguably in a worse way (details on the "worse" aspect,
>
> below).
>
> For reference, the refactored driver code was done in r297837:
>
> [Driver] Restructure handling of -ffast-math and similar options
>
> To be clear, I'm not at all suggesting that the above change was 
> incorrect.  I
>
> think that refactoring of the driver code is the right thing to do.  
> An aspect
>
> of this refactoring is that prior to it, when a user passed 
> '-ffast-math' on
>
> the command-line, it was also passed to the cc1 process, even if a
>
> sub-fast-math component was disabled.  With the refactoring, the 
> driver only
>
> passes '-ffast-math' to cc1 when a specific set of sub-fast-math 
> components are
>
> enabled.
>
> More specifically, when a user specifies just '-ffast-math' on the
>
> command-line, the following 7 sub-fast-math switches:
>
> -fno-honor-infinities
>
> -fno-honor-nans
>
> -fno-math-errno
>
> -fassociative-math
>
> -freciprocal-math
>
> -fno-signed-zeros
>
> -fno-trapping-math
>
> get passed to cc1 (this is true both with the old (pre r297837) and 
> new (since
>
> r297837) compilers).  Furthermore, the "umbrella" '-ffast-math' is 
> also passed
>
> to cc1 in this case of the user specifying just '-ffast-math' on the
>
> command-line (again, in both the old and new compilers).
>
> The difference related to this issue in the old/new behavior, is that 
> when a
>
> user turns on fast-math but disables one (or more) of the sub-fast-math
>
> switches, for example, as in:
>
> clang++ -O2 -ffast-math -fno-reciprocal-math -c foo.cpp
>
> then in the old mode '-ffast-math' was still passed to cc1 (acting as an
>
> umbrella, causing trouble), but in the new mode '-ffast-math' is no longer
>
> passed to cc1 in this case.  (In both the old and new modes,
>
> '-freciprocal-math' is not passed to cc1 with this command-line, as you'd
>
> expect.)
>
> What's happening is that in the old mode, it was the user passing 
> '-ffast-math'
>
> on the command-line that resulted in passing the umbrella 
> '-ffast-math' to cc1
>
> (even if all 7 of the sub-fast-math switches were disabled by the user).
>
> Whereas in the new mode, the '-ffast-math' switch is passed to cc1 iff 
> all 7 of
>
> the underlying sub-fast-math switches are enabled.
>
> I'd say that's an improvement in the handling of the switches, and 
> also on the
>
> plus side, I think it makes dealing with the concerns I raised in 
> November LLVM
>
> a little clearer, and so more manageable in some sense.  But on the 
> negative
>
> side, since the new behavior in LLVM is arguably worse, fixing the 
> back-end
>
> issues is now a higher priority for my customers.
>
> The behavior that is arguably worse, is that when a user enables 
> fast-math, but
>
> attempts to disable one of the sub-fast-math aspects, the old behavior 
> (pre
>
> r297837) was that the sub-fast-math aspect to be disabled, generally 
> (often?)
>
> remained enabled.  The new behavior (since r297837) is that when 
> disabling a
>
> sub-fast-math aspect, that aspect plus many more (possibly often the 
> majority)
>
> of the fast-math transformations are disabled.  So this results in a
>
> performance regression in these fast-math contexts when a 
> sub-fast-math aspect
>
> is disabled, which is why it is a fairly high priority for us.
>
> FTR, r297837 was made during llvm 5.0 development, so the new behavior 
> has the
>
> effect of a performance regression in moving from 4.0 to 5.0.  In 
> describing
>
> things here, I'll compare llvm 4.0 with llvm 5.0 behavior.  But more 
> precisely,
>
> it's pre-r297837 with post-r297837 behavior.
>
> Here is a tiny example, to illustrate it concretely:
>
> $ cat assoc.cpp
>
> //////////// "assoc.cpp" ////////////
>
> float foo(float a, float x)
>
> {
>
> return ((a + x) - x);  // fastmath reassociation eliminates the arithmetic
>
> }
>
> /////////////////////////////////////
>
> $
>
> When -ffast-math is specified, the reassociation enabled by it allows 
> us to
>
> simply return the first argument (and that reassociation does happen with
>
> '-ffast-math', with both the old and new compilers):
>
> $ clang -c -O2 -o x.o assoc.cpp
>
> $ llvm-objdump -d x.o | grep "^ .*:    "
>
> 0:       f3 0f 58 c1     addss   %xmm1, %xmm0
>
>      4:       f3 0f 5c c1     subss   %xmm1, %xmm0
>
> 8:       c3      retq
>
> $ clang -c -O2 -ffast-math -o x.o assoc.cpp
>
> $ llvm-objdump -d x.o | grep "^ .*:    "
>
> 0:       c3      retq
>
> $
>
> FTR, GCC also does the reassociation transformation here when 
> '-ffast-math' is
>
> used, as expected.
>
> But when using '-ffast-math' and disabling a sub-fast-math aspect of 
> it (say
>
> via '-fno-reciprocal-math', '-fno-associative-math', or 
> '-fmath-errno'), both
>
> the old and new compilers exhibit incorrect behavior in some cases.  
> With the
>
> old compiler, the behavior was that using any of these switches did 
> not disable
>
> the transformation.  Those switches were mostly ineffective. (Only
>
> '-fno-associative-math' should disable the transformation in this 
> example, so
>
> the fact that the other ones didn't disable it is correct/desired.)  
> Here is
>
> the old behavior for the above test-case, when some example sub-fast-math
>
> aspects are individually disabled:
>
> $ old/bin/clang --version | grep version
>
> clang version 4.0.0 (tags/RELEASE_400/final)
>
> $ old/bin/clang -c -O2 -o x.o assoc.cpp
>
> $ llvm-objdump -d x.o | grep "^ .*:    "
>
> 0:       f3 0f 58 c1     addss   %xmm1, %xmm0
>
> 4:       f3 0f 5c c1     subss   %xmm1, %xmm0
>
> 8:       c3      retq
>
> $ old/bin/clang -c -O2 -ffast-math -o x.o assoc.cpp
>
> $ llvm-objdump -d x.o | grep "^ .*:    "
>
> 0:       c3      retq
>
> $ old/bin/clang -c -O2 -ffast-math -fno-reciprocal-math -o x.o assoc.cpp
>
> $ llvm-objdump -d x.o | grep "^ .*:    "
>
> 0:       c3      retq
>
> $ old/bin/clang -c -O2 -ffast-math -fno-associative-math -o x.o 
> assoc.cpp # Error
>
> $ llvm-objdump -d x.o | grep "^ .*:    "
>
> 0:       c3      retq
>
> $ old/bin/clang -c -O2 -ffast-math -fmath-errno -o x.o assoc.cpp
>
> $ llvm-objdump -d x.o | grep "^ .*:    "
>
> 0:       c3      retq
>
> $
>
> So with the old compiler, the case marked 'Error' above is incorrect, 
> in that
>
> the reassociation should be suppressed in that case, but it isn't.
>
> Again FTR, the GCC behavior disables the re-association in the case marked
>
> 'Error' above.
>
> Moving on to the new compiler, instead of '-fno-associative-math' being
>
> ineffective, the problem is that when disabling other sub-fast-math 
> aspects
>
> (unrelated to reassociation), the transformation is suppressed, when 
> it should
>
> not be.  Here is the new behavior with that same set of sub-fast-math 
> aspects
>
> individually disabled:
>
> $ new/bin/clang --version | grep version
>
> clang version 5.0.0 (tags/RELEASE_500/final)
>
> $ new/bin/clang -c -O2 -o x.o assoc.cpp
>
> $ llvm-objdump -d x.o | grep "^ .*:    "
>
> 0:       f3 0f 58 c1     addss   %xmm1, %xmm0
>
> 4:       f3 0f 5c c1     subss   %xmm1, %xmm0
>
> 8:       c3      retq
>
> $ new/bin/clang -c -O2 -ffast-math -o x.o assoc.cpp
>
> $ llvm-objdump -d x.o | grep "^ .*:    "
>
> 0:       c3      retq
>
> $ new/bin/clang -c -O2 -ffast-math -fno-reciprocal-math -o x.o 
> assoc.cpp # Error
>
> $ llvm-objdump -d x.o | grep "^ .*:    "
>
> 0:       f3 0f 58 c1     addss   %xmm1, %xmm0
>
> 4:       f3 0f 5c c1     subss   %xmm1, %xmm0
>
> 8:       c3      retq
>
> $ new/bin/clang -c -O2 -ffast-math -fno-associative-math -o x.o 
> assoc.cpp # Good
>
> $ llvm-objdump -d x.o | grep "^ .*:    "
>
> 0:       f3 0f 58 c1     addss   %xmm1, %xmm0
>
> 4:       f3 0f 5c c1     subss   %xmm1, %xmm0
>
> 8:       c3      retq
>
> $ new/bin/clang -c -O2 -ffast-math -fmath-errno -o x.o assoc.cpp # Error
>
> $ llvm-objdump -d x.o | grep "^ .*:    "
>
> 0:       f3 0f 58 c1     addss   %xmm1, %xmm0
>
> 4:       f3 0f 5c c1     subss   %xmm1, %xmm0
>
> 8:       c3      retq
>
> $
>
> The two cases marked as 'Error' are incorrectly suppressing the 
> re-association.
>
> The case marked as 'Good' is now doing the right thing for this test-case.
>
> Again FTR, the GCC behavior allows the re-association in the cases marked
>
> 'Error' above to happen.
>
> __________________________________________________________________
>
> Note that the '-f[no-]associative-math' flag has other problems, 
> reported in
>
> PR27372 (https://bugs.llvm.org/show_bug.cgi?id=27372). Those "other 
> problems"
>
> are related to the fact that there isn't an LLVM IR fast-math-flag that
>
> explicitly indicates whether reassociation is enabled or disabled.  As a
>
> consequence, the front-end essentially drops that flag on the floor.  The
>
> back-end has no way of explicitly looking for that capability, and so the
>
> back-end implementation instead relies on the "umbrella" aspect of 'fast'
>
> implicitly turning on all the lower-level fast-math-flags.  This is a key
>
> aspect of the problem.  Near the start of this post, I mentioned that 
> the LLVM
>
> issues are related to the umbrella aspect of 'fast', along with other
>
> fast-math-flag implementation details.  The fact that the back-end has 
> no way
>
> of explicitly checking whether reassociation is enabled is what I meant by
>
> those other implementation details.
>
> Going to a more general discussion of the problem, the documentation 
> of the
>
> fast-math-flags at:
>
> http://llvm.org/docs/LangRef.html#fast-math-flags
>
> can be described loosely as:
>
> nnan Allow optimizations to assume the arguments and result are not NaN
>
> ninf Allow optimizations to assume the arguments and result are not +/-Inf
>
> nsz Allow optimizations to treat the sign of a zero argument or result
>
> as insignificant
>
> arcp Allow optimizations to use the reciprocal of an argument rather than
>
> perform division
>
> contract Allow floating-point contraction (e.g. fused multiply-and-add)
>
> And the flag 'fast' is defined there as:
>
> fast Fast - Allow algebraically equivalent transformations that may
>
> dramatically change results in floating point (e.g. reassociate).
>
> This flag implies all the others.
>
> (Side point: Back in November, 'contract' was not an explicit 
> fast-math-flag.
>
> This is a recent change, but it doesn't impact the issue I'm raising 
> here.)
>
> To summarize, and to relate this somewhat back to the November 2016 post:
>
> http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html
>
> as described in that older post, this means that 'fast' could be 
> described as:
>
> Very loosely, 'fast' means "all the aggressive FP-transformations that
>
> are not controlled by one of the other 5, plus it implies all the other
>
> 5".  If for terminology, we call those additional aggressive
>
> optimizations 'aggr', then we have:
>
> 'fast' == 'aggr' + 'nnan' + 'ninf' + 'nsz' + 'arcp' + 'contract'
>
> But there isn't a specific flag for 'aggr' (it's just "on" when all 
> the other
>
> flags are "on").  Reassociation is part of these additional 'aggr'
>
> transformations. Back in November, Hal pointed out that libm 
> transformations
>
> are another part of these 'aggr' transformations.  With that, one possible
>
> direction is to add two more sub-fast-math flags, say 'reassoc' and 
> 'libm':
>
> 'reassoc' + 'libm' + 'nnan' + 'ninf' + 'nsz' + 'arcp' + 'contract'
>
> This would allow disabling (for example) 'arcp' without suppressing
>
> reassociation. Whether there would be a need for an "umbrella" flag 'fast'
>
> that implies all the others is somewhat orthogonal, although 
> personally I feel
>
> it complicates the issue and doesn't provide any significant benefit.  
> I can
>
> imagine that there is a benefit that I haven't thought of -- I don't 
> claim to
>
> have a deep understanding of the implementation.  So I'd like to hear what
>
> others think.
>
> One important aspect of this is that it appears to me there are quite 
> a few
>
> fast-math transformations that are enabled only when all the underlying
>
> sub-fast-math flags are on (that is, only when the 'fast' umbrella flag is
>
> set). That's a key part of the problem of PR27372.  In this context, the
>
> change in behavior from r297837 is that with the old behavior, the 
> following
>
> two commands are almost equivalent (in many cases, they are equivalent):
>
> $ # Old behavior: The following two commands are nearly identical:
>
> $ clang -c -O2 -ffast-math foo.cpp
>
> $ clang -c -O2 -ffast-math -fno-reciprocal-math foo.cpp
>
> $
>
> Whereas with the new behavior (post-r297837), the following two 
> commands are
>
> almost always equivalent:
>
> $ # New behavior: The following two commands are nearly identical:
>
> $ clang -c -O2 foo.cpp
>
> $ clang -c -O2 -ffast-math -fno-reciprocal-math foo.cpp
>
> $
>
> (Again, '-fno-reciprocal-math' is just an example of the suppression of a
>
> sub-fast-math aspect here.  '-fno-associative-math and '-fmath-errno' 
> would
>
> also be good examples.)
>
> Succinctly, if a '-ffast-math' user now disables a sub-fast-math 
> aspect, they
>
> will be frustrated that they end up disabling almost the entire set of
>
> fast-math transformations.  Whereas previously, they would be 
> frustrated that
>
> their attempt of disabling a specific sub-fast-math aspect was 
> ineffective.  So
>
> previously, they might try to "fix a numerical instability" by disabling a
>
> sub-fast-math aspect (and be frustrated by it not being effective), 
> and now if
>
> they try to "fix that numerical instability", they will succeed, but 
> they will
>
> see a performance-hit of losing nearly all the performance gain that
>
> '-ffast-math' was providing.
>
> As an aside, on the PS4 with llvm 4.0 (and earlier) compilers, we've 
> had a few
>
> customers frustrated that '-ffast-math -fno-reciprocal-math' was still 
> doing
>
> reciprocal transformations.  So we've had a private change to make
>
> '-fno-reciprocal-math' suppress the reciprocal optimization.  With a 
> vanilla
>
> llvm 5.0, those customers would see a performance hit (so we have a 
> different
>
> private change to address that).
>
> As a final point here, to give more weight to this, I took a random 
> bit of code
>
> I found on github that that has floating-point fast-math opportunities 
> in it,
>
> and experimented with it.  (I just searched for 'mandelbrot', and took the
>
> first thing I found.)  Specifically:
>
> https://gist.github.com/andrejbauer/7919569
>
> This test-case has a few divisions in it, but it doesn't contain any
>
> reciprocal-transformation opportunities (so '-f[no-]reciprocal-math' 
> should
>
> essentially be a no-op).
>
> The old Clang behavior has the following two commands being nearly 
> identical
>
> (they generate essentially equivalent code -- just some minor register
>
> change):
>
> $ # Old Clang behavior:
>
> $ # No significant difference when –fno-reciprocal-math is added (as 
> desired)
>
> $ clang -S -O2 -ffast-math -o O2fm.s mandelbrot.c
>
> $ clang -S -O2 -ffast-math -fno-reciprocal-math -o O2fm.no_arcp.s 
> mandelbrot.c
>
> $ diff O2fm.s O2fm.no_arcp.s | wc
>
> 4      10      56
>
> $
>
> That is, as expected/desired, the '-fno-reciprocal-math' switch has 
> essentially
>
> no impact on this, since there are no reciprocal transformations being 
> done.
>
> Also as expected, the difference between "plain -O2" and '-O2 
> -ffast-math' is
>
> more substantial:
>
> $ # Old Clang behavior:
>
> $ # '-O2' vs '–O2 –ffast-math' shows a significant difference (as desired)
>
> $ clang -S -O2 -o O2.s mandelbrot.c
>
> $ diff O2.s O2fm.s | wc
>
> 43     184    1305
>
> $
>
> That is, adding '-ffast-math' to '-O2' is transforming the code, 
> presumably
>
> making it faster (at the cost of a potential loss in numerical accuracy).
>
> With GCC for this example (I used version 4.8.4, which isn't particularly
>
> modern, but I happen to have it handy), I get similar behavior.  For 
> example,
>
> the following two commands produce identical assembly code:
>
> $ gcc -S -O2 -ffast-math -o O2fm.s mandelbrot.c
>
> $ gcc -S -O2 -ffast-math -fno-reciprocal-math -o O2fm.no_arcp.s 
> mandelbrot.c
>
> $ diff O2fm.s O2fm.no_arcp.s
>
> $
>
> and that code is substantially different than the GCC "plain -O2" code:
>
> $ gcc -S -O2 -o O2.s mandelbrot.c
>
> $ diff O2.s O2fm.s | wc
>
> 44     126     719
>
> $
>
> But comparing this to the new Clang behavior, we see that
>
> '-fno-reciprocal-math' is mow "disabling too much", as discussed in detail
>
> above for the simple "assoc.cpp" test-case.  Specifically:
>
> $ # New Clang behavior:
>
> $ clang -S -O2 -o O2.s mandelbrot.c
>
> $ clang -S -O2 -ffast-math -o O2fm.s mandelbrot.c
>
> $ clang -S -O2 -ffast-math -fno-reciprocal-math -o O2fm.no_arcp.s 
> mandelbrot.c
>
> $
>
> $ # Adding -ffast-math to -O2 continues to show significant diffs 
> (expected)
>
> $ diff O2.s O2fm.s | wc
>
> 35     105     622
>
> $
>
> $ # too many differences -- should be nearly the same
>
> $ diff O2fm.s O2fm.no_arcp.s | wc
>
> 29      89     526
>
> $
>
> So with the new behavior, even though there are no reciprocal 
> transformation
>
> opportunities, disabling that transformation via '-fno-reciprocal-math'
>
> disables many (most) of the fast-math features.  In fact, comparing 
> plain '-O2'
>
> with '-O2 -ffast-math -fno-reciprocal-math', it's clear that they are 
> virtually
>
> identical with the new Clang behavior.  Specifically, we get only a minor
>
> difference (of swapping of two register operands in a comparison, and 
> changing
>
> the sense of the associated branch) when comparing '-O2' with
>
> '-O2 -ffast-math -fno-reciprocal-math':
>
> $ # New Clang behavior:
>
> $ # nearly identical, but there should be many diffs
>
> $ diff O2.s O2fm.no_arcp.s
>
> 188,189c188,189
>
> < ucomisd %xmm5, %xmm6
>
> < ja      .LBB0_7
>
> ---
>
> >ucomisd %xmm6, %xmm5
>
> >jb      .LBB0_7
>
> $
>
> In full disclosure, for this "mandelbrot.c" test-case, I don't know if 
> any of
>
> the changes in code-gen done by us or by GCC when '-ffast-math' is 
> enabled are
>
> helpful (from a performance perspective) or dangerous (from a precise 
> IEEE FP
>
> math perspective).  All I know is that for both us and GCC at -O2, the 
> switch
>
> '-ffast-math' changed the code-gen, and that '-ffast-math 
> -fno-reciprocal-math'
>
> didn't suppress any of those changes for GCC, but it suppressed 
> essentially all
>
> of the changes for us.
>
> For continuity, I'm repeating the summary here (that I had near the 
> beginning).
>
> In Summary:
>
> 1. With the change of r297837, the driver now more cleanly handles
>
> '-ffast-math', and other sub-fast-math switches (like
>
> '-f[no]-reciprocal-math', '-f[no-]math-errno', and others).
>
> 2. Prior to that change, the disabling of a sub-fast-math switch was often
>
> ineffective.  So as an example, the following two commands often resulted
>
> in the same code-gen, even if there were
>
> fast-math-reciprocal-transformations that were done:
>
> clang++ -O2 -ffast-math -c foo.cpp
>
> clang++ -O2 -ffast-math -fno-reciprocal-math -c foo.cpp
>
> 3. Since that change, the disabling of a sub-fast-math switch disables 
> many
>
> more sub-fast-math transformations than just the one specified.  So now,
>
> the following two commands often result in very similar (and sometimes
>
> identical) code-gen:
>
> clang++ -O2 -c foo.cpp
>
> clang++ -O2 -ffast-math -fno-reciprocal-math -c foo.cpp
>
> That is, disabling a single sub-fast-math transformation in some (many?)
>
> cases now ends up disabling almost all the fast-math transformations.
>
>   This causes a performance hit for people that have been doing this.
>
> 4. To fix this, I think that additional fast-math-flags are likely 
> needed in
>
> the IR.  Instead of the following set:
>
> 'nnan' + 'ninf' + 'nsz' + 'arcp' + 'contract'
>
> something like this:
>
> 'reassoc' + 'libm' + 'nnan' + 'ninf' + 'nsz' + 'arcp' + 'contract'
>
> would be more useful.  Related to this, the current 'fast' flag which acts
>
> as an umbrella (enabling 'nnan' + 'ninf' + 'nsz' + 'arcp' + 
> 'contract') may
>
> not be needed.  A discussion on this point was raised last November on the
>
> mailing list:
>
> http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html
>
> Thanks,
>
> -Warren
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory



More information about the llvm-dev mailing list