[llvm-dev] [RFC] Using basic block attributes to implement non-default floating point environment

Serge Pavlov via llvm-dev llvm-dev at lists.llvm.org
Tue Oct 8 00:38:59 PDT 2019


I see this approach is not supported, so I am trying to elaborate another
solution.
Nevertheless I'd like to address some comments, just for references.


On Fri, Oct 4, 2019 at 1:43 AM Kaylor, Andrew <andrew.kaylor at intel.com>
wrote:

> I’d like to emphasize that the constrained intrinsics prevent
> optimizations **by default**. We have a plan to go back and teach
> individual optimizations how to handle these intrinsics.
>
The idea is that if an optimization knows nothing about the constrained
> intrinsics then it won’t try to transform them, but if an optimization has
> been taught to handle the intrinsics correctly then it isn’t limited by
> anything other than the semantics of the constraints. Once we’ve updated an
> optimization pass, it will be able to do everything with a constrained
> intrinsic that has the “relaxed” settings (“fpexcept.ignore” and
> “fpround.tonearest”) that it would be able to do with the regular operation.
>

This work is necessary for any approach, but for the current is is vital.
As constrained intrinsics are used in entire function body, the code base
where the solution must work correctly and fast is larger. The performance
drop make this solution inappropriate for many users, they wouldn't use it
until the performance become close to the case without constrained
intrinsics. In contrast basic block attributes limit the constrained
intrinsics with only part of function code. It would be easier to make the
solution suitable for use in production code.

Of course, when reasoning about performance, it would be nice to have
numbers.


>
> This philosophy is key to the way that we’re approaching FPENV support.
> One of the primary goals is that any optimization that isn’t specifically
> aware of the mechanisms we’re using will automatically get conservatively
> correct behavior. The problem with relying on basic block attributes is
> that it requires teaching all current optimizations to look for the
> attribute.
>

All these optimizations must be eventually modified in the current approach
as well. If a transformation makes dangerous instruction or basic block
move it must be taught to process constrained intrinsics correctly, or it
becomes a source of performance drop.

But you are right, implementation of basic block attributes require
implementation of mechanism that checks validity of instruction and basic
block moves. After it is implemented, the search of the places where
transformation require modification become simpler.

On Fri, Oct 4, 2019 at 1:54 AM Doerfert, Johannes <jdoerfert at anl.gov> wrote:
>
> The way I understood it, the constraint intrinsics are not the only
>> problem but the regular ones can be. That is, optimizations will move
>> around/combine/replace/... regular floating operations in the presence
>> of constraint intrinsics because they do not impact each other (other
>> than def-use). If that understanding is correct, and this is a problem,
>> then I doubt that we want basic block attributes.
>
>
Basic block attributes allows to partition function code into realms, where
FP operation is represented by either constrained intrinsic or by regular
node. Code that moves instructions checks if particular instruction is
allowed to pass realm boundary. This mechanism prevents from mixing
constrained intrinsics with regular FP nodes, but still allows
optimizations like inlining.

On Thu, Oct 3, 2019 at 10:45 PM Doerfert, Johannes <jdoerfert at anl.gov>
wrote:

> On 10/03, Serge Pavlov wrote:
> >
> > Outlining is an interesting solution but unfortunately it is not an
> option
> > for processors for machine learning. Branching is expensive on them and
> > some processors do not have call instruction, all function calls must be
> > eventually inlined.
>
> Would "really late" inlining be an option?


Late inlining means fewer optimization possibilities. If resulting code
represents a single function (as in the case of kernels) it is usually more
profitable to do early inlining.

Thanks,
--Serge

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191008/1efc4dd7/attachment.html>


More information about the llvm-dev mailing list