[llvm-dev] [RFC] Using basic block attributes to implement non-default floating point environment
Doerfert, Johannes via llvm-dev
llvm-dev at lists.llvm.org
Thu Oct 3 13:32:26 PDT 2019
For the record, I'd love to have a "inline late" attribute for other purposes as well :)
From: Kaylor, Andrew <andrew.kaylor at intel.com>
Sent: Thursday, October 3, 2019 15:00
To: Doerfert, Johannes
Cc: Serge Pavlov; LLVM Developers Mailing List
Subject: RE: [llvm-dev] [RFC] Using basic block attributes to implement non-default floating point environment
> The way I understood it, the constraint intrinsics are not
> the only problem but the regular ones can be. That is,
> optimizations will move around/combine/replace/... regular
> floating operations in the presence of constraint intrinsics
> because they do not impact each other (other than def-use).
> If that understanding is correct, and this is a problem, then I
> doubt that we want basic block attributes. Also, given that the
> constraint intrinsics are inaccessible_mem_only, optimizations
> will work with them as they work with other opaque instructions
> for which certain effects are known.
Right. The motion of first class FP operations is nearly unrestricted. In particular, there is nothing to prevent them from moving past a call to something like fesetround() (or an architecture-specific intrinsic that does something like that). There shouldn't be any calls to fesetround() outside of a block where FENV_ACCESS is enabled, but we need a call like that to act as a barrier, so we need constrained intrinsics for the regular unconstrained operations in mixed modes so that we can restrict their movement. This would imply that calling fesetround() should trigger the strictfp mode in a function the same way that having constrained intrinsics does. That feels a bit like pulling a loose thread on a sweater, but we should think about it.
> (Btw. is it intentional that these can unwind?)
I think so. If you unmask FP exceptions most of the constrained intrinsics might trigger a signal. I don't know if that needs to be modeled as unwind on Unix systems (probably not?), but on Windows I'm pretty sure it can be caught by SEH.
> Agreed. Outlineing seems a reasonable approach to avoid code motion
> or at least restrict the locations that need to know about the constraints.
> Given that we already have no implicit float, it seems natural to use it
> here and make sure IPOs honor it.
I have some reservations about outlining. I guess it solves the immediate problem, but as Serge noted it isn't friendly to all targets, and I think the call overhead would often be an issue even on systems that can handle calls.
How would the front end decide when to outline? Might it be better to let the user make that choice and provide a mechanism to mark a function for "very late inlining"?
From: Doerfert, Johannes <jdoerfert at anl.gov>
Sent: Thursday, October 03, 2019 11:55 AM
To: Kaylor, Andrew <andrew.kaylor at intel.com>
Cc: Serge Pavlov <sepavloff at gmail.com>; LLVM Developers Mailing List <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] [RFC] Using basic block attributes to implement non-default floating point environment
On 10/03, Kaylor, Andrew via llvm-dev wrote:
> I’d like to emphasize that the constrained intrinsics prevent
> optimizations *by default*. We have a plan to go back and teach
> individual optimizations how to handle these intrinsics. The idea is
> that if an optimization knows nothing about the constrained intrinsics
> then it won’t try to transform them, but if an optimization has been
> taught to handle the intrinsics correctly then it isn’t limited by
> anything other than the semantics of the constraints. Once we’ve
> updated an optimization pass, it will be able to do everything with a
> constrained intrinsic that has the “relaxed” settings
> (“fpexcept.ignore” and “fpround.tonearest”) that it would be able to
> do with the regular operation.
The way I understood it, the constraint intrinsics are not the only problem but the regular ones can be. That is, optimizations will move around/combine/replace/... regular floating operations in the presence of constraint intrinsics because they do not impact each other (other than def-use). If that understanding is correct, and this is a problem, then I doubt that we want basic block attributes. Also, given that the constraint intrinsics are inaccessible_mem_only, optimizations will work with them as they work with other opaque instructions for which certain effects are known.
(Btw. is it intentional that these can unwind?)
> This philosophy is key to the way that we’re approaching FPENV
> support. One of the primary goals is that any optimization that isn’t
> specifically aware of the mechanisms we’re using will automatically
> get conservatively correct behavior. The problem with relying on basic
> block attributes is that it requires teaching all current
> optimizations to look for the attribute.
> We had a somewhat similar problem when we implemented Windows
> exception handling. The implementation introduced basic blocks that
> instructions shouldn’t be hoisted or sunk into. We ended up having to
> chase down a lot of cases where our rules were violated. I think this
> stems from not having a single place to check the legality of code
Agreed. Outlineing seems a reasonable approach to avoid code motion or at least restrict the locations that need to know about the constraints.
Given that we already have no implicit float, it seems natural to use it here and make sure IPOs honor it.
More information about the llvm-dev