<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p><br>
</p>
<div class="moz-cite-prefix">On 05/23/2018 11:06 AM, Hubert Tong via
llvm-dev wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CACvkUqY+w4Yf5Zyt81DGww4FYD7qLL9WUd_Nx-=r5Lo_z0K30A@mail.gmail.com">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<div dir="ltr">
<div>Hi Ulrich,<br>
<br>
</div>
<div>I am interested in knowing if the current proposals also
take into account the FP_CONTRACT pragma</div>
</div>
</blockquote>
<br>
We should already do this (we turn relevant operations into the
@llvm.fmuladd. when FP_CONTRACT is set to on during IR generation).<br>
<br>
<blockquote type="cite"
cite="mid:CACvkUqY+w4Yf5Zyt81DGww4FYD7qLL9WUd_Nx-=r5Lo_z0K30A@mail.gmail.com">
<div dir="ltr">
<div> and the ability to implement options that imply a specific
value for the FLT_EVAL_METHOD macro.<br>
</div>
</div>
</blockquote>
<br>
What do you mean by this?<br>
<br>
-Hal<br>
<br>
<blockquote type="cite"
cite="mid:CACvkUqY+w4Yf5Zyt81DGww4FYD7qLL9WUd_Nx-=r5Lo_z0K30A@mail.gmail.com">
<div dir="ltr">
<div><br>
</div>
<div>Additionally, I am not aware of the IR being able to
represent the potentially deferred loss of precision that the
C language semantics provide; in particular, applying such
semantics to the existing IR would hit an issue that the
limits of such deferment would need an agreed representation.<br>
<br>
</div>
<div>As for the mixing of strict and non-strict modes, I would
be interested in where LLVM is in its handling of non-SSA
(pseudo-memory?) dependencies. I have a vague impression that
it is very coarse-grained in that respect, but I admit to not
being particularly informed in that space. If there is a good
model for such dependencies, then I think it could be used to
handle the strict/non-strict mixing.<br>
</div>
<div><br>
</div>
-- Hubert Tong, IBM<br>
<br>
<div>PS A nitpick on wording: The idea of being inside or
outside of FENV_ACCESS regions is instead be expressed in
terms of the state of the FENV_ACCESS pragma within the C
Standard.<br>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Wed, May 23, 2018 at 10:48 AM,
Ulrich Weigand via llvm-dev <span dir="ltr"><<a
href="mailto:llvm-dev@lists.llvm.org" target="_blank"
moz-do-not-send="true">llvm-dev@lists.llvm.org</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
<p><font size="2">Hello,</font><br>
<br>
<font size="2">at the recent EuroLLVM developer meeting
in Bristol I held a BoF<br>
session on the topic "Towards implementing #pragma
STDC FENV_ACCESS".<br>
I've also had a number of follow-on discussions both
on-site in<br>
Bristol and online since. This post is intended as a
summary of<br>
my current understanding set of requirements and
implementation<br>
details covering the overall topic.</font><br>
<br>
<font size="2">I'm posting this here in the hope this
can serve as a basis for<br>
the various more detailed discussions that are still
ongoing<br>
(e.g. in various Phabricator proposals right now). Any
comments<br>
are welcome!</font><br>
<br>
<br>
<font size="2">Semantics of #pragma STDC FENV_ACCESS<br>
==============================<wbr>=======</font><br>
<br>
<font size="2">To provide a baseline for the
implementation discussion, first an<br>
overview of the features required to handle the strict
floating-point<br>
mode defined by the C and IEEE standard:</font><br>
<br>
<font size="2">1. Floating-point rounding modes<br>
2. Default floating-point exception handling<br>
3. Trapping floating-point exception handling</font><br>
<br>
<font size="2">Each of these separate features imposes
different constraints on the<br>
optimizations that LLVM may perform involving FP
expressions:</font><br>
<br>
<font size="2">1. Floating-point rounding modes</font><br>
<br>
<font size="2">Outside of FENV_ACCESS regions, all FP
operations are supposed to be<br>
performed in the "default" rounding mode.</font><br>
<br>
<font size="2">But inside FENV_ACCESS regions, FP
operations implicitly depend on<br>
a "current" rounding mode setting, which may be
changed by certain<br>
C library calls (plus some platform-specific
intrinsics). In addition,<br>
those calls may be performed within subroutines (as
long as those are<br>
also within FENV_ACCESS), so *any* function call
within a FENV_ACCESS<br>
must be considered as potentially changing the
rounding mode.</font><br>
<br>
<font size="2">In effect, this means the compiler may
not move or combine FP<br>
operations accross function call sites.</font><br>
<br>
<font size="2">2. Default floating-point exception
handling</font><br>
<br>
<font size="2">Inside FENV_ACCESS regions, every
floating-point operation that<br>
causes an exception must be considered to set a
"status flag"<br>
associated with this exception type. Those flags can
be queried<br>
using C library calls (plus some platform-specific
intrinsics),<br>
and there are other such calls to explicitly set or
clear those<br>
flags as well. As with the rounding modes, those calls
may be<br>
performed in subroutines as well, so any function call
within a<br>
FENV_ACCESS region must be considered as potentially
*using* and<br>
changing the floating-point exception status flags.</font><br>
<br>
<font size="2">The values of the status flags on entry
to a FENV_ACCESS are to<br>
be considered undefined according to the C standard.</font><br>
<br>
<font size="2">Compiler optimizations are supposed to
preserve the values of<br>
all exception status bits at any point where they can
be<br>
(potentially) inspected by the program, i.e. at all
call sites<br>
within FENV_ACCESS regions. This still allows a number
of<br>
optimizations, e.g. to reorder FP operations or
combine two<br>
identical operations within a region uninterrupted by
calls.<br>
But other optimizations should be avoided, e.g.
optimizing<br>
away an unused FP operation may result in an exception
flag<br>
now being unset that would otherwise have been set.
The same<br>
applies to floating-point constant folding.</font><br>
<br>
<font size="2">3. Trapping floating-point exception
handling</font><br>
<br>
<font size="2">Within a FENV_ACCESS region, library
calls may be used to switch<br>
exception handling semantics to a "trapping" mode by
setting<br>
corresponding mask bits. Any subsequent FP instruction
that<br>
raises an exception with the associated mask bit set
will cause<br>
a trap. Usually, this will be a hardware trap that is
translated<br>
by the operating system into some form of software
exception that<br>
can by handled by the applcation; on Linux systems
this takes the<br>
form of a SIGFPE signal.</font><br>
<br>
<font size="2">As above, those mask bits can be set and
reset via (operating-<br>
system specific) library calls and/or
platform-specific intrinsics,<br>
all of which may also be done within subroutine calls.</font><br>
<br>
<font size="2">In effect, this requires the compiler to
treat any floating-point<br>
operation within a FENV_ACCESS region as potentially
trapping,<br>
which means the same restrictions apply as with e.g.
memory accesses<br>
(cannot be speculated etc.) However, according to the
C standard,<br>
the implementation is not required to preserve the
*number* of<br>
different traps, so identical operations may still be
combined<br>
(unless there is an intervening function call).</font><br>
<br>
<font size="2">The C standard requires all user code to
explicitly switch back<br>
to non-trapping mode for all exceptions whenever
leaving a<br>
FENV_ACCESS region (both by "falling off the end" of
the region<br>
and by calling a subroutine defined outside of
FENV_ACCESS).</font><br>
<br>
<br>
<font size="2">Implementation requirements on parts of
the compiler<br>
==============================<wbr>======================</font><br>
<br>
<font size="2">A. clang front end</font><br>
<br>
<font size="2">The front end needs to determine which
instructions are part of<br>
FENV_ACCESS regions and which are not. This takes into
account<br>
both the semantics of the #pragma as defined by the
standard,<br>
and the implementation-defined default rules that
apply to code<br>
outside of any #pragma. GCC currently has the
following two<br>
related command-line options:</font><br>
<br>
<font size="2">-frounding-math: Do not assume default
rounding mode<br>
-ftrapping-math: Assume FP operations may trap</font><br>
<br>
<font size="2">clang accepts but (basically) ignores
those options. As a first<br>
step, it might make sense to have the FENV_ACCESS
default</font><br>
<font size="2">behavior triggered by these options, even
while the front end<br>
does not yet support the actual #pragma.</font><br>
<br>
<font size="2">The front end then needs to transmit the
information about<br>
FENV_ACCESS regions to later passes. However, I
believe that<br>
we do not actually have to implement "regions" as such
at the<br>
IR level. Instead, it would be sufficient to track the
follwing<br>
information:</font><br>
<br>
<font size="2">- For each FP operation, whether it is
within a FENV_ACCESS region.<br>
- For each call site, whether it is within a
FENV_ACCESS region.</font><br>
<br>
<font size="2">The former requires new IR support; the
approach currently under<br>
investigation uses the experimental "constrained FP"
intrinsics<br>
instead of traditional floating-point operations for
this. The<br>
latter can be done simply by annotating those call
sites with an<br>
attribute.</font><br>
<br>
<font size="2">In addition to that, the front-end itself
needs to disable any<br>
early optimizations that do not preserve strict FP
semantics,<br>
in particular it must not speculate FP operations if
they may<br>
trap. (Currently, the front end transforms "? :" on
floating-<br>
point types into a select IR statement; for trapping
FP<br>
operations, an explicit branch must be used instead.)</font><br>
<br>
<br>
<font size="2">B. LLVM IR and LLVM common optimizations</font><br>
<br>
<font size="2">As mentioned in the previous section, we
need some IR to annotate<br>
FP instructions and call sites within FENV_ACCESS
regions. All<br>
common optimizations then need to respect the strict
FP semantics<br>
associated with those regions.</font><br>
<br>
<font size="2">The current approach uses experimental
intrinsics. This has the<br>
advantage that most optimizations never trigger since
they don't<br>
even recognize those new intrinsics. Also, the
intrinsics can<br>
be marked as having side-effects and/or being
non-speculatable.</font><br>
<br>
<font size="2">The overall effect is that more
optimizations are suppressed<br>
than would be strictly necessary. But this may still
be a good<br>
first step, since the result is now safe but maybe not
optimal<br>
-- which can be improved upon over time by teaching
the specific<br>
semantics of those intrinsics to optimization passes.</font><br>
<br>
<font size="2">However, some open questions remain. If
at some point we want<br>
to model the constrained FP semantics more precisely
than just<br>
as "unmodeled side effects", this may have to be
reflected at<br>
the IR level directly. For example, to model rounding
mode<br>
behavior, at some point we might require explicit
tracking of<br>
data dependencies on the rounding mode by representing
the<br>
rounding mode as SSA values defined by function calls
and used<br>
by FP intrinsics. Similarly, to track exception status
flags,<br>
they might be modeled as SSA values set by FP
intrinsics and<br>
used by function calls.</font><br>
<br>
<font size="2">(There is a possibly related question of
how to optimally model<br>
the property of many math library routines that they
may access<br>
the "errno" variable but no other memory ... It might
also be<br>
possible to model e.g. exception status as a
thread-local "memory"<br>
location that is modified by FP operations, just like
errno.)</font><br>
<br>
<font size="2">Another currently unresolved issue is
that at the moment nothing<br>
prevents *standard* floating-point operations from
being moved<br>
*inside* FENV_ACCESS regions. This may also be
invalid, since<br>
those operations now may cause unexpected traps etc.
(More<br>
specifically, what is invalid is moving any standard
FP operation<br>
across a *call site* within a FENV_ACCESS region.)
Note that<br>
this is even an issue if we only support changing the
default<br>
(and no actual #pragma) if mutiple object files using
different<br>
default settings are being linked together using LTO.</font><br>
<br>
<font size="2">This last issue could in theory be solved
by having all optimization<br>
passes respect the requirement that floating-point
operations may<br>
not be moved across call sites marked with the strict
FP attribute.<br>
But that does not appear to be straightforward since
it would<br>
introduce a "new" type of dependeny that would have to
be added<br>
throughout LLVM code. If this must be avoided, we'd
have to<br>
find a way to explicity track dependencies at the IR
level. In<br>
the extreme, this could end up equivalent to just
always using<br>
the constrained intrinsics for everything ...</font><br>
<br>
<br>
<font size="2">C. Code generation</font><br>
<br>
<font size="2">In the back end, effects of strict FP
mode have to passed through<br>
to lower-level representations including SelectionDAG
and MI.</font><br>
<br>
<font size="2">Currently, the "unmodeled side effect"
logic of the constrained<br>
intrinsics is modeled by putting them on the chain
during SelectionDAG.<br>
(If we ever model semantics more precisely at the IR
level, that<br>
would need to be reflected on SelectionDAG
accordingly.)</font><br>
<br>
<font size="2">At the MI level, there is no
representation at all. One option to<br>
fix this would be to model target-specific registers
that implement<br>
the IEEE semantics. Most platforms have registers (or
parts of<br>
registers) that hold:<br>
- the current rounding mode<br>
- the exception status flags<br>
- the exception masks (which enable traps)<br>
Marking FP instructions as using and/or defining these
registers<br>
would enforce ordering requirements. It may be too
strict in some<br>
cases (e.g. two instructions setting exception status
flags may<br>
still be reordered). On the other hand, I believe if
instructions<br>
may actually *trap*, we actually need the
hasSideEffects flag even<br>
if register dependencies are modeled.</font><br>
<br>
<font size="2">If we do need hasSideEffects, there is a
separate discussion on<br>
whether this can be implemented without each back end
having to<br>
duplicate all FP instruction patterns (one with
hasSideEffects<br>
and one without), e.g. by having a new feature that
allows to<br>
describe the side-effect status using an MI operand.</font><br>
<br>
<br>
<font size="2">Next steps<br>
==========</font><br>
<br>
<font size="2">I believe it is important to break up the
full amount of work<br>
into incremental steps that provide some useful
benefits on their<br>
own. At first, we should be able to get to a state
where clang<br>
can be used to build programs that use some (maybe not
all) strict<br>
FP features, where the generated code is always
correct but may<br>
not always be optimal. To get there, I think we need
at a <br>
minimum:</font><br>
<br>
<font size="2">- Implement clang support for the default
flags, e.g. GCC's<br>
-frounding-math and -ftrapping-math, and generate
always<br>
the constrained intrinsics. clang should also mark all<br>
call sites then (as mentioned above).</font><br>
<br>
<font size="2">- For now, add the requirement that LTO
is not supported if<br>
this would cause mixing of strict and non-strict FP
code.<br>
In the alternative, have the LTO pass automatically
transform<br>
and floating-point operation into a constrained
intrinsic<br>
if *any* (other) module already uses the latter.</font><br>
<br>
<font size="2">- At the IR level, complete the set of
supported constrained<br>
FP intrinsics (there are still some missing, see e.g <br>
</font><font size="2"><a
href="https://reviews.llvm.org/D43515"
target="_blank" moz-do-not-send="true">https://reviews.llvm.org/<wbr>D43515</a></font><font
size="2">).<br>
Also, it seems not all variants (e.g. for vector
types) are<br>
supported correctly through codegen (see e.g.<br>
</font><font size="2"><a
href="https://reviews.llvm.org/D46967"
target="_blank" moz-do-not-send="true">https://reviews.llvm.org/<wbr>D46967</a></font><font
size="2">).</font><br>
<br>
<font size="2">- Allow targets to correctly reflect
constrained intrinsics<br>
semantics at the MI level and final machine code
generation<br>
(see e.g. </font><font size="2"><a
href="https://reviews.llvm.org/D45576"
target="_blank" moz-do-not-send="true">https://reviews.llvm.org/<wbr>D45576</a></font><font
size="2">).</font><br>
<br>
<font size="2">- Review all optimization and codegen
passes to verify they<br>
fully respect strict FP semantics.</font><br>
<br>
<font size="2">Once this is done, we can improve on the
solution by:</font><br>
<br>
<font size="2">- Supporting mixing strict and non-strict
FP operations<br>
(would lift the LTO restriction). (Note: there seems<br>
to be still some "invention required" here, see
above.)</font><br>
<br>
<font size="2">- Actually implementing the #pragma
supporting different<br>
regions within a compilation unit (prereq: support for<br>
mixing strict and non-strict FP operations).</font><br>
<br>
<font size="2">- Add more optimization of constrained FP
intrinsics in<br>
common optimizers and/or target back ends.</font><br>
<br>
<font size="2">Does this look reasonable? Please let me
know if there's<br>
anything I overlooked, or you have any additional
comments<br>
or questions.</font><br>
<br>
<br>
<font size="2"><br>
Mit freundlichen Gruessen / Best Regards<span
class="HOEnZb"><font color="#888888"><br>
<br>
Ulrich Weigand<br>
<br>
-- <br>
Dr. Ulrich Weigand | Phone: +49-7031/16-3727<br>
STSM, GNU/Linux compilers and toolchain<br>
IBM Deutschland Research & Development GmbH<br>
Vorsitzende des Aufsichtsrats: Martina Koederitz |
Geschäftsführung: Dirk Wittkopp<br>
Sitz der Gesellschaft: Böblingen |
Registergericht: Amtsgericht Stuttgart, HRB 243294</font></span></font><br>
</p>
</div>
<br>
______________________________<wbr>_________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org"
moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>
<a
href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"
rel="noreferrer" target="_blank" moz-do-not-send="true">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
<br>
</blockquote>
</div>
<br>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
LLVM Developers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>
<a class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>
</pre>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
</body>
</html>