[llvm-dev] [RFC] Using basic block attributes to implement non-default floating point environment
Doerfert, Johannes via llvm-dev
llvm-dev at lists.llvm.org
Thu Oct 3 08:45:53 PDT 2019
On 10/03, Serge Pavlov wrote:
> On Thu, Oct 3, 2019 at 7:01 AM Finkel, Hal J. <hfinkel at anl.gov> wrote:
> > On 10/2/19 5:12 PM, Hal Finkel wrote:
> > > On 10/1/19 12:35 AM, Serge Pavlov via llvm-dev wrote:
> > >
> > > The main concern about such approach is performance drop. Using
> > > constrained FP operations means that optimizations on FP operations are
> > > turned off, this is the main reason of using them. Even if non-default FP
> > > environment is used in a small piece of a function, optimizations are
> > > turned off in entire function. For many practical application this is
> > > unacceptable.
> > The reason, as you're likely aware, that the constrained FP operations
> > must be used within the entire function is that, if you mix the constrained
> > FP operations with the normal ones, there's no way to prevent code motion
> > from intermixing them.
> This proposal presents a way to prevent such intermixing. In some basic
> block we use normal FP operations, in others - constrained, BB attributes
> allows to check validity of instruction moves.
I'm really unsure how feasible it is to look at basic block annotations
all the time. It might also interfere with CFG simplifications, e.g.,
block merging. Having "implicit" dependences is generally bad (IMHO).
> > Johannes and I discussed the outlining here offline, and two notes:
> > 1. The outlining itself will prevent the undesired code motion today, but
> > in the future we'll have IPO transformations that will need to be
> > specifically taught to avoid moving FP operations into these outlined
> > functions.
> > 2. The outlined functions will need to be marked with noinline and also
> > noimplicitfloat. In fact, all functions using the constrained intrinsics
> > might need to be marked with noimplicitfloat. The above-mentioned
> > restrictions on IPO passes might be conditioned on the noimplicitfloat
> > attribute.
> Outlining is an interesting solution but unfortunately it is not an option
> for processors for machine learning. Branching is expensive on them and
> some processors do not have call instruction, all function calls must be
> eventually inlined.
Would "really late" inlining be an option?
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 228 bytes
Desc: not available
More information about the llvm-dev