[llvm-dev] [RFC] Support of non-default floating point environment on RISC-V

Fri Mar 12 05:02:02 PST 2021

Hi all,

I am interested  in the support of non-default FP environment on RISC-V. It
requires some severe changes to the way the FP instructions are described
now, so it is important to collect opinions and concerns on this topic.
Although the discussion is about RISC-V, much of the material here is
relevant to any target that needs to support a non-default FP environment.

What is wrong with FP support now?

Most floating point instructions can set accrued exception bits in `fflags`
register to signal about some exceptional events, like overflow, invalid
operation and so on. Instructions with dynamic rounding mode also depend on
the content of the `frm` register. Now RISC-V FP instructions are specified
so that they completely ignore these dependencies.

Such implementation is suitable for default FP environment only (
https://llvm.org/docs/LangRef.html#floating-point-environment). When using
it in a non-default FP environment, incorrect code may be produced. For
example, in the following code:

```

    csrwi  frm, a1

    fadd.d ft2, ft2, ft3

```

compiler may change the order of instructions, which results in incorrect
behavior. Although `fadd.d` depends on the value of `frm`, this fact is not
presented in the properties of FP instructions. Similarly, the code:

```

    fadd.d ft2, ft2, ft3

    csrrs t0, fcsr, zero

```

does not allow changing the order of the instructions, as `crsrs` reads
content of `fflags`, which is set by the first instruction. But the
compiler doesn't know about this dependency.

How to solve this problem

Description of the FP instructions should be modified so that dependencies
with `fflags` and `frm` would be present in the instruction descriptions.
Both these registers are not specified in the instructions, these are
implicit dependencies. Usually they are added to properties `Uses` and
`Defs` of an `Instruction`.

RISC-V allows static rounding mode, which is taken from instruction bits
rather than from `frm`. It means that any instruction that can depend on
rounding mode exists in two variants:

   1. sets `fflags`, depends on `frm` (dynamic rounding mode),
   2. sets `fflags`, does not depend on `frm` (static rounding mode).

Such a set of instructions precisely represents hardware, but is not
suitable for the default FP environment. Changes of `fflags` are ignored in
this mode, so dependencies on `fflags` creates useless output dependencies
that prevent optimal scheduling. As the default FP environment is the most
important use case, these variants should also be considered:

   1. changes of `fflags` is ignored, does not depend on `frm` (default FP
   environment).
   2. changes of `fflags` is ignored, depends on `frm`.

So, there can be 4 variants of each FP instruction, probably it is too
many. Variant 1 must be supported, it is the most general case in sense of
restrictions. Variant 3 also is mandatory, as it represents the default FP
environment. Variants 2 and 4 may be omitted but some optimization
opportunities would be lost.

Lowering of instruction in default FP environment

Instructions like `fadd`, which are used in default FP environment, may be
lowered in a couple of ways:

   - to the instruction that uses static rounding mode RNE, or
   - to the instruction that uses dynamic rounding mode. In this case `frm`
   must contain RNE.

The case of static rounding mode has some advantages:

   - It does not require synchronization of `frm` when FP environment is
   changed to default,
   - The code that uses only static rounding mode may be safely called from
   any code that uses different rounding mode,
   - Instructions with static rounding may be moved freely just as any
   other instructions,
   - It simplifies implementation of things like `#pragma STDC FENV_ROUND`.

An issue is possible in this case. A code can set a non-default rounding
mode by a call to `fesetround`, the subsequent instructions would be
executed with the new rounding mode. As `fesetround` usually is an external
function, the call instruction serves as a barrier, preventing undesired
moves. In the case when `#pragma STDC FENV_ACCESS` is unsupported it is an
acceptable solution. If such code is ported to RISC-V it would fail, if
instructions would use static rounding.

As a temporary solution the compiler should lower instructions in default
FP environment to variants with dynamic rounding mode. It should decrease
the risk of failure. When constrained intrinsics will be implemented for
RISC-V, the lowering can be changed to use static rounding.

Are there any things that should also be considered? How many instruction
variants should be supported (2, 3, 4)?

Any feedback is appreciated.

Thanks,
--Serge
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210312/05b15524/attachment-0001.html>