[llvm-dev] [RFC] Using basic block attributes to implement non-default floating point environment

Thu Oct 3 11:43:48 PDT 2019

I’d like to emphasize that the constrained intrinsics prevent optimizations *by default*. We have a plan to go back and teach individual optimizations how to handle these intrinsics. The idea is that if an optimization knows nothing about the constrained intrinsics then it won’t try to transform them, but if an optimization has been taught to handle the intrinsics correctly then it isn’t limited by anything other than the semantics of the constraints. Once we’ve updated an optimization pass, it will be able to do everything with a constrained intrinsic that has the “relaxed” settings (“fpexcept.ignore” and “fpround.tonearest”) that it would be able to do with the regular operation.

This philosophy is key to the way that we’re approaching FPENV support. One of the primary goals is that any optimization that isn’t specifically aware of the mechanisms we’re using will automatically get conservatively correct behavior. The problem with relying on basic block attributes is that it requires teaching all current optimizations to look for the attribute.

We had a somewhat similar problem when we implemented Windows exception handling. The implementation introduced basic blocks that instructions shouldn’t be hoisted or sunk into. We ended up having to chase down a lot of cases where our rules were violated. I think this stems from not having a single place to check the legality of code motion.

-Andy

From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Serge Pavlov via llvm-dev
Sent: Monday, September 30, 2019 10:36 PM
To: LLVM Developers <llvm-dev at lists.llvm.org>
Subject: [llvm-dev] [RFC] Using basic block attributes to implement non-default floating point environment

Hi all,

This proposal is aimed at support of floating point environment, in which some properties like rounding mode or exception behavior differ from those used by default. This include in particular support of 'pragma STDC FENV_ACCESS', 'pragma STDC FENV_ROUND' as well as some other related facilities.

Problem

On many processors use of non-default floating mode requires modification of global state by writing into some register. It presents a difficulty for implementation as a floating point instruction must not be move to code which executes with different floating point state. To prevent from such moves, the current solution represents FP operations with special (constrained) instructions, which do not participate in optimizations (http://lists.llvm.org/pipermail/cfe-dev/2017-August/055325.html). It is important that the constrained FP operations must be used everywhere in entire function including inlined calls, if they are used in some part of it.

The main concern about such approach is performance drop. Using constrained FP operations means that optimizations on FP operations are turned off, this is the main reason of using them. Even if non-default FP environment is used in a small piece of a function, optimizations are turned off in entire function. For many practical application this is unacceptable.

Although this approach prevents from moving instructions, it does not prevent from moving basic blocks. The code that uses non-default FP environment at some point must set appropriate state registers, do necessary operations and then restore the original mode. If this activity is scattered by several basic blocks, block-level optimizations can break these arrangement, for instance a basic block with default FP operations can be moved after the block that sets non-default FP environment.

Solution

The proposed approach is based on extension of basic blocks. It is assumed that code in basic block is executed in the same FP environment. The assumption is consistent with the rules of using 'pragma STDC FENV_ACCESS' and similar facilities. If the environment differs from default, such block has pointer to some object that keeps the block attributes including FP settings. All basic blocks, obtained from the same block where 'pragma STDC FENV_ACCESS' is specified, share the same attribute object. In bytecode these attributes are represented by metadata attached to the basic blocks.

With basic block attributes compiler can assert validity of an instruction move by comparing attributes of source and destination BBs. An instruction should keep pointer to BB attributes even if it is detached from BB, to support common technique of moving instructions. Similarly compiler can verify validity of BB movement.

Such approach allows to develop implementation in which constrained FP operations are 'jailed' in their basic blocks. Other part of the function can still use usual FP operations and get profit of optimizations. Depending on the target hardware some FP operations may be allowed to cross the 'jail' boundary, for instance, it they correspond to instructions which directly encode rounding mode and FP environment change rounding mode only.

Is this solution feasible? What are obstacles, difficulties or drawbacks for it? Are there any improvements for it? Any feedback is welcome.

Thanks,
--Serge
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191003/f787fc4e/attachment.html>