<div dir="ltr"><p style="margin:0in;font-family:Calibri;font-size:11pt" lang="x-none">Hi all,</p>
<p style="margin:0in;font-family:Calibri;font-size:11pt" lang="x-none">Â </p>
<p style="margin:0in;font-family:Calibri;font-size:11pt">I am interested in the support of non-default FP environment
on RISC-V. It requires some severe changes to the way the FP instructions are
described now, so it is important to collect opinions and concerns on this topic.
Although the discussion is about RISC-V, much of the material here is relevant
to any target that needs to support a non-default FP environment.</p>
<p style="margin:0in;font-family:Calibri;font-size:11pt">Â </p>
<p style="margin:0in;font-family:Calibri;font-size:11pt"><span style="font-weight:bold">What is wrong with FP support now?</span></p>
<p style="margin:0in;font-family:Calibri;font-size:11pt">Â </p>
<p style="margin:0in;font-family:Calibri;font-size:11pt">Most floating point
instructions can set accrued exception bits in `fflags` register to signal
about some exceptional events, like overflow, invalid operation and so on.
Instructions with dynamic rounding mode also depend on the content of the `frm`
register. Now RISC-V FP instructions are specified so that they completely
ignore these dependencies.</p>
<p style="margin:0in;font-family:Calibri;font-size:11pt">Â </p>
<p style="margin:0in;font-family:Calibri;font-size:11pt">Such implementation
is suitable for default FP environment only (<a href="https://llvm.org/docs/LangRef.html#floating-point-environment">https://llvm.org/docs/LangRef.html#floating-point-environment</a>).
When using it in a non-default FP environment, incorrect code may be produced.
For example, in the following code:</p>
<p style="margin:0in;font-family:Calibri;font-size:11pt">```</p>
<p style="margin:0in;font-family:Calibri;font-size:11pt">Â Â Â csrwiÂ
frm, a1</p>
<p style="margin:0in;font-family:Calibri;font-size:11pt">Â Â Â fadd.d ft2, ft2, ft3</p>
<p style="margin:0in;font-family:Calibri;font-size:11pt">```</p>
<p style="margin:0in;font-family:Calibri;font-size:11pt">compiler may change
the order of instructions, which results in incorrect behavior. Although `fadd.d`
depends on the value of `frm`, this fact is not presented in the properties of
FP instructions. Similarly, the code:</p>
<p style="margin:0in;font-family:Calibri;font-size:11pt">```</p>
<p style="margin:0in;font-family:Calibri;font-size:11pt">Â Â Â fadd.d ft2, ft2, ft3</p>
<p style="margin:0in;font-family:Calibri;font-size:11pt">Â Â Â csrrs t0, fcsr, zero</p>
<p style="margin:0in;font-family:Calibri;font-size:11pt">```</p>
<p style="margin:0in;font-family:Calibri;font-size:11pt">does not allow changing the order of the instructions, as `crsrs` reads content of `fflags`,
which is set by the first instruction. But the compiler doesn't know about this
dependency.</p>
<p style="margin:0in;font-family:Calibri;font-size:11pt">Â </p>
<p style="margin:0in;font-family:Calibri;font-size:11pt"><span style="font-weight:bold">How to solve this problem</span></p>
<p style="margin:0in;font-family:Calibri;font-size:11pt">Â </p>
<p style="margin:0in;font-family:Calibri;font-size:11pt">Description of the FP
instructions should be modified so that dependencies with `fflags` and `frm`
would be present in the instruction descriptions. Both these registers are not
specified in the instructions, these are implicit dependencies. Usually they
are added to properties `Uses` and `Defs` of an `Instruction`.</p>
<p style="margin:0in;font-family:Calibri;font-size:11pt">Â </p>
<p style="margin:0in;font-family:Calibri;font-size:11pt">RISC-V allows static
rounding mode, which is taken from instruction bits rather than from `frm`. It
means that any instruction that can depend on rounding mode exists in two
variants:</p>
<ol type="1" style="margin-left:0.375in;direction:ltr;unicode-bidi:embed;margin-top:0in;margin-bottom:0in;font-family:Calibri;font-size:11pt">
<li value="1" style="margin-top:0px;margin-bottom:0px;vertical-align:middle"><span style="font-size:11pt">sets `fflags`, depends on
`frm` (dynamic rounding mode),</span></li>
<li style="margin-top:0px;margin-bottom:0px;vertical-align:middle"><span style="font-size:11pt">sets `fflags`, does not
depend on `frm` (static rounding mode).</span></li>
</ol>
<p style="margin:0in;font-family:Calibri;font-size:11pt">Â </p>
<p style="margin:0in;font-family:Calibri;font-size:11pt">Such a set of
instructions precisely represents hardware, but is not suitable for the default FP
environment. Changes of `fflags` are ignored in this mode, so dependencies on
`fflags` creates useless output dependencies that prevent optimal
scheduling. As the default FP environment is the most important use case, these
variants should also be considered:</p>
<ol type="1" style="margin-left:0.375in;direction:ltr;unicode-bidi:embed;margin-top:0in;margin-bottom:0in;font-family:Calibri;font-size:11pt">
<li value="3" style="margin-top:0px;margin-bottom:0px;vertical-align:middle"><span style="font-size:11pt">changes of `fflags` is
ignored, does not depend on `frm` (default FP environment).</span></li>
<li style="margin-top:0px;margin-bottom:0px;vertical-align:middle"><span style="font-size:11pt">changes of `fflags` is
ignored, depends on `frm`.</span></li>
</ol>
<p style="margin:0in;font-family:Calibri;font-size:11pt" lang="x-none">Â </p>
<p style="margin:0in;font-family:Calibri;font-size:11pt" lang="x-none">So,
there can be 4 variants of each FP instruction, probably it is too many.
Variant 1 must be supported, it is the most general case in sense of
restrictions. Variant 3 also is mandatory, as it represents the default FP
environment. Variants 2 and 4 may be omitted but some optimization
opportunities would be lost.</p>
<p style="margin:0in;font-family:Calibri;font-size:11pt" lang="x-none">Â </p>
<p style="margin:0in;font-family:Calibri;font-size:11pt" lang="x-none"><span style="font-weight:bold">Lowering of instruction in default FP environment</span></p>
<p style="margin:0in;font-family:Calibri;font-size:11pt" lang="x-none">Â </p>
<p style="margin:0in;font-family:Calibri;font-size:11pt" lang="x-none">Instructions
like `fadd`, which are used in default FP environment, may be lowered
in a couple of ways:</p>
<ul type="disc" style="margin-left:0.375in;direction:ltr;unicode-bidi:embed;margin-top:0in;margin-bottom:0in">
<li style="margin-top:0px;margin-bottom:0px;vertical-align:middle" lang="x-none"><span style="font-family:Calibri;font-size:11pt">to the instruction that uses
static rounding mode RNE, or</span></li>
<li style="margin-top:0px;margin-bottom:0px;vertical-align:middle" lang="x-none"><span style="font-family:Calibri;font-size:11pt">to the instruction that uses
dynamic rounding mode. In this case `frm` must contain RNE.</span></li>
</ul>
<p style="margin:0in;font-family:Calibri;font-size:11pt" lang="x-none">Â </p>
<p style="margin:0in;font-family:Calibri;font-size:11pt" lang="x-none">The case
of static rounding mode has some advantages:</p>
<ul type="disc" style="margin-left:0.375in;direction:ltr;unicode-bidi:embed;margin-top:0in;margin-bottom:0in">
<li style="margin-top:0px;margin-bottom:0px;vertical-align:middle" lang="x-none"><span style="font-family:Calibri;font-size:11pt">It does not require
synchronization of `frm` when FP environment is changed to default,</span></li>
<li style="margin-top:0px;margin-bottom:0px;vertical-align:middle" lang="x-none"><span style="font-family:Calibri;font-size:11pt">The code that uses only
static rounding mode may be safely called from any code that uses
different rounding mode,</span></li>
<li style="margin-top:0px;margin-bottom:0px;vertical-align:middle" lang="x-none"><span style="font-family:Calibri;font-size:11pt">Instructions with static
rounding may be moved freely just as any other instructions,</span></li>
<li style="margin-top:0px;margin-bottom:0px;vertical-align:middle" lang="x-none"><span style="font-family:Calibri;font-size:11pt">It simplifies implementation
of things like `#pragma STDC FENV_ROUND`.</span></li>
</ul>
<p style="margin:0in;font-family:Calibri;font-size:11pt" lang="x-none">Â </p>
<p style="margin:0in;font-family:Calibri;font-size:11pt" lang="x-none">An issue
is possible in this case. A code can set a non-default rounding mode by a call to
`fesetround`, the subsequent instructions would be executed with the new rounding
mode. As `fesetround` usually is an external function, the call instruction
serves as a barrier, preventing undesired moves. In the case when `#pragma
STDC FENV_ACCESS` is unsupported it is an acceptable solution. If such code is
ported to RISC-V it would fail, if instructions would use static rounding.</p>
<p style="margin:0in;font-family:Calibri;font-size:11pt" lang="x-none">Â </p>
<p style="margin:0in;font-family:Calibri;font-size:11pt" lang="x-none">As a
temporary solution the compiler should lower instructions in default FP environment
to variants with dynamic rounding mode. It should decrease the risk of failure.
When constrained intrinsics will be implemented for RISC-V, the lowering can be
changed to use static rounding.</p>
<p style="margin:0in;font-family:Calibri;font-size:11pt" lang="x-none">Â </p>
<p style="margin:0in;font-family:Calibri;font-size:11pt" lang="x-none">Â </p>
<p style="margin:0in;font-family:Calibri;font-size:11pt" lang="x-none">Are
there any things that should
also
be considered? How many instruction variants
should be supported (2, 3, 4)?</p><p style="margin:0in;font-family:Calibri;font-size:11pt" lang="x-none">Any feedback is appreciated.</p><p style="margin:0in;font-family:Calibri;font-size:11pt" lang="x-none"><br></p><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">Thanks,<br>--Serge<br></div></div></div>