[PATCH] D53157: Teach the IRBuilder about constrained fadd and friends

Tue Nov 20 08:50:24 PST 2018

uweigand added a comment.

OK, let me try to expand on my point 3 above, which appears to have confused everybody :-)

First, let's distinguish two separate requirements: those on floating-point operations that explicitly run in regions with non-default control modes, and those on floating-point operations that run outside such regions.  (Note that those regions are by definition a subset of the regions marked as FENV_ACCESS ON, but not necessarily coincide with them.)

For the first class of operations, there are various requirements to constrain optimization.  All of those are handled in the current design by simply representing those operations via constrained intrinsics.  This can be done simply by always using constrained intrinsics whenever we are inside any FENV_ACCESS ON region.  (My comments above were not intended to refer to those at all.)

Now, for the second class of operations, most of these requirements do not apply.  However, there is one critical requirement that still **does** apply: such operations may never be moved to **inside** a region with non-default control modes.  (Where it makes a difference, the operation can in principle be even moved inside a FENV_ACCESS ON region, as long as the control modes actually have not yet changed from the default modes.)

As @andrew.w.kaylor mentioned above, to fulfil this requirement if suffices to prevent moving any such floating-point operation across any statement that modifies the control modes (or inspects the exception flags).  There is only a small number of statements that do so directly (those would all be inline asm or platform-specific builtins), but the action can be hidden behind a function call as well.  This applies in particular to the related C library calls with well-known names (which the compiler could detect), but in addition any random function call might **indirectly** contain such a statement (which the compiler will not know, in general).  Since LLVM today will freely move normal floating-point operations across normal function calls, this is a problem that needs to be addressed.

Now, one way to do that would be to ensure that any floating-point operation outside FENV_ACCESS ON regions is **also** implemented via constrained intrinsics, as long as there is any FENV_ACCESS ON present in the function at all.  As @cameron.mcinally mentioned above, that would prevent reordering (across any function call, even) and therefore solve the problem.  However, I believe there were concerns whether we can reliably ensure that property in the presence of optimizations like inlining, in particular if LTO is also used.  Even if this can be resolved, I believe there were additional concerns that this might overly constrain optimization and lead to suboptimal performance.

This finally gets to the intent of my point 3 above, where I was trying to see if there is any way we can do better, that is correctly handle floating-point operations outside FENV_ACCESS ON regions **without** having to turn them all into constrained intrinsics.  One way might be to add some new code motion barrier in the IR optimizers that would prevent movement of floating-point operations across the location of any FENV_ACCESS pragma.  But  -- and this is I think what @kpn referered to above -- those locations don't really correspond to anything in the IR that could serve as such a barrier.  My observation above was simply that if we want to go that way, it in fact isn't really necessary to put those barriers at exactly these places.  Instead, we can put the barriers at those places that will actually modify (or inspect) the control words -- which are either special operations we can explicitly recognize, or general function calls -- but only those calls where the call site was originally inside a FENV_ACCESS ON region (and was marked by the front-end as such).  This has the advantage that a function call site is a natural place to act as barrier -- in fact it already acts as such e.g. for global memory accesses. (It would also be easy to implement as barrier on the MI / back-end level.)  In addition, this approach would not lead to any performance penalty outside of FENV_ACCESS ON regions at all.

But given that there is still infrastructure missing in the IR optimizers, I also think that at least in the first implementation, we probably should go with the original approach and just use constrained intrinsics everywhere in the function, and possibly add some function attribute that prevent any cross-inlining of functions built with constrained intrinsics with functions built with regular floating-point operations.

https://reviews.llvm.org/D53157