<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, May 23, 2018 at 12:19 PM, Hal Finkel <span dir="ltr"><<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"><span class="gmail-">
<p><br>
</p>
<div class="gmail-m_-1433965244057454815moz-cite-prefix">On 05/23/2018 11:06 AM, Hubert Tong via
llvm-dev wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>Hi Ulrich,<br>
<br>
</div>
<div>I am interested in knowing if the current proposals also
take into account the FP_CONTRACT pragma</div>
</div>
</blockquote>
<br></span>
We should already do this (we turn relevant operations into the
@llvm.fmuladd. when FP_CONTRACT is set to on during IR generation).<span class="gmail-"><br></span></div></blockquote><div>I am not sure we have the same interpretation of what the FP_CONTRACT pragma does. Subclause 6.5 paragraph 8 of C11 implies (for example) that even where the FENV_ACCESS pragma is "on", folding a constant subexpression with an exactly representable result on an implementation where FLT_EVAL_METHOD is 0 is within the range of acceptable implementation-defined behaviour despite intermediate overflow under non-contracted evaluation. Which is to say that the current proposal reads as what needs to be done when FP_CONTRACT is "off" and FENV_ACCESS is "on". The note from Ulrich implies that the requirements are imposed by the Standard, but the range of implementation defined behaviour where FP_CONTRACT is "on" where FENV_ACCESS is also "on" is possibly a discussion to be had.<br><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF"><span class="gmail-">
<br>
<blockquote type="cite">
<div dir="ltr">
<div> and the ability to implement options that imply a specific
value for the FLT_EVAL_METHOD macro.<br>
</div>
</div>
</blockquote>
<br></span>
What do you mean by this?<br></div></blockquote><div>I admit that modes where FLT_EVAL_METHOD, respectively, is 0 (no extra range and precision), 1 (float in double range and precision), and 2 (float and double in long double range and precision) are all straightforward for the IR producer to implement by fixing the types used in the IR emitted (implying the value FLT_EVAL_METHOD is not constant within a program).<br><br>So, this is more about implementing meaningful cases of FLT_EVAL_METHOD being -1. My point below (in my previous note) is that allowing IR passes or the back-end to choose the range and precision in a manner conforming to Standard C (for a FLT_EVAL_METHOD of -1)--perhaps for speed where multiple sets of floating-point operations/registers are available with differing "preferred types"--appears to be a use case that the IR does not seem to support well. As for why a FLT_EVAL_METHOD of -1 is on-topic for this thread: The language semantics allow the case of the constant subexpression folding I mentioned above even when FP_CONTRACT is "off" and FENV_ACCESS is "on", because the evaluation format used for the evaluation of that subexpression can be said to have infinite range and precision.<br><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF">
<br>
-Hal<div><div class="gmail-h5"><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div><br>
</div>
<div>Additionally, I am not aware of the IR being able to
represent the potentially deferred loss of precision that the
C language semantics provide; in particular, applying such
semantics to the existing IR would hit an issue that the
limits of such deferment would need an agreed representation.<br>
<br>
</div>
<div>As for the mixing of strict and non-strict modes, I would
be interested in where LLVM is in its handling of non-SSA
(pseudo-memory?) dependencies. I have a vague impression that
it is very coarse-grained in that respect, but I admit to not
being particularly informed in that space. If there is a good
model for such dependencies, then I think it could be used to
handle the strict/non-strict mixing.<br>
</div>
<div><br>
</div>
-- Hubert Tong, IBM<br>
<br>
<div>PS A nitpick on wording: The idea of being inside or
outside of FENV_ACCESS regions is instead be expressed in
terms of the state of the FENV_ACCESS pragma within the C
Standard.<br>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Wed, May 23, 2018 at 10:48 AM,
Ulrich Weigand via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p><font size="2">Hello,</font><br>
<br>
<font size="2">at the recent EuroLLVM developer meeting
in Bristol I held a BoF<br>
session on the topic "Towards implementing #pragma
STDC FENV_ACCESS".<br>
I've also had a number of follow-on discussions both
on-site in<br>
Bristol and online since. This post is intended as a
summary of<br>
my current understanding set of requirements and
implementation<br>
details covering the overall topic.</font><br>
<br>
<font size="2">I'm posting this here in the hope this
can serve as a basis for<br>
the various more detailed discussions that are still
ongoing<br>
(e.g. in various Phabricator proposals right now). Any
comments<br>
are welcome!</font><br>
<br>
<br>
<font size="2">Semantics of #pragma STDC FENV_ACCESS<br>
==============================<wbr>=======</font><br>
<br>
<font size="2">To provide a baseline for the
implementation discussion, first an<br>
overview of the features required to handle the strict
floating-point<br>
mode defined by the C and IEEE standard:</font><br>
<br>
<font size="2">1. Floating-point rounding modes<br>
2. Default floating-point exception handling<br>
3. Trapping floating-point exception handling</font><br>
<br>
<font size="2">Each of these separate features imposes
different constraints on the<br>
optimizations that LLVM may perform involving FP
expressions:</font><br>
<br>
<font size="2">1. Floating-point rounding modes</font><br>
<br>
<font size="2">Outside of FENV_ACCESS regions, all FP
operations are supposed to be<br>
performed in the "default" rounding mode.</font><br>
<br>
<font size="2">But inside FENV_ACCESS regions, FP
operations implicitly depend on<br>
a "current" rounding mode setting, which may be
changed by certain<br>
C library calls (plus some platform-specific
intrinsics). In addition,<br>
those calls may be performed within subroutines (as
long as those are<br>
also within FENV_ACCESS), so *any* function call
within a FENV_ACCESS<br>
must be considered as potentially changing the
rounding mode.</font><br>
<br>
<font size="2">In effect, this means the compiler may
not move or combine FP<br>
operations accross function call sites.</font><br>
<br>
<font size="2">2. Default floating-point exception
handling</font><br>
<br>
<font size="2">Inside FENV_ACCESS regions, every
floating-point operation that<br>
causes an exception must be considered to set a
"status flag"<br>
associated with this exception type. Those flags can
be queried<br>
using C library calls (plus some platform-specific
intrinsics),<br>
and there are other such calls to explicitly set or
clear those<br>
flags as well. As with the rounding modes, those calls
may be<br>
performed in subroutines as well, so any function call
within a<br>
FENV_ACCESS region must be considered as potentially
*using* and<br>
changing the floating-point exception status flags.</font><br>
<br>
<font size="2">The values of the status flags on entry
to a FENV_ACCESS are to<br>
be considered undefined according to the C standard.</font><br>
<br>
<font size="2">Compiler optimizations are supposed to
preserve the values of<br>
all exception status bits at any point where they can
be<br>
(potentially) inspected by the program, i.e. at all
call sites<br>
within FENV_ACCESS regions. This still allows a number
of<br>
optimizations, e.g. to reorder FP operations or
combine two<br>
identical operations within a region uninterrupted by
calls.<br>
But other optimizations should be avoided, e.g.
optimizing<br>
away an unused FP operation may result in an exception
flag<br>
now being unset that would otherwise have been set.
The same<br>
applies to floating-point constant folding.</font><br>
<br>
<font size="2">3. Trapping floating-point exception
handling</font><br>
<br>
<font size="2">Within a FENV_ACCESS region, library
calls may be used to switch<br>
exception handling semantics to a "trapping" mode by
setting<br>
corresponding mask bits. Any subsequent FP instruction
that<br>
raises an exception with the associated mask bit set
will cause<br>
a trap. Usually, this will be a hardware trap that is
translated<br>
by the operating system into some form of software
exception that<br>
can by handled by the applcation; on Linux systems
this takes the<br>
form of a SIGFPE signal.</font><br>
<br>
<font size="2">As above, those mask bits can be set and
reset via (operating-<br>
system specific) library calls and/or
platform-specific intrinsics,<br>
all of which may also be done within subroutine calls.</font><br>
<br>
<font size="2">In effect, this requires the compiler to
treat any floating-point<br>
operation within a FENV_ACCESS region as potentially
trapping,<br>
which means the same restrictions apply as with e.g.
memory accesses<br>
(cannot be speculated etc.) However, according to the
C standard,<br>
the implementation is not required to preserve the
*number* of<br>
different traps, so identical operations may still be
combined<br>
(unless there is an intervening function call).</font><br>
<br>
<font size="2">The C standard requires all user code to
explicitly switch back<br>
to non-trapping mode for all exceptions whenever
leaving a<br>
FENV_ACCESS region (both by "falling off the end" of
the region<br>
and by calling a subroutine defined outside of
FENV_ACCESS).</font><br>
<br>
<br>
<font size="2">Implementation requirements on parts of
the compiler<br>
==============================<wbr>======================</font><br>
<br>
<font size="2">A. clang front end</font><br>
<br>
<font size="2">The front end needs to determine which
instructions are part of<br>
FENV_ACCESS regions and which are not. This takes into
account<br>
both the semantics of the #pragma as defined by the
standard,<br>
and the implementation-defined default rules that
apply to code<br>
outside of any #pragma. GCC currently has the
following two<br>
related command-line options:</font><br>
<br>
<font size="2">-frounding-math: Do not assume default
rounding mode<br>
-ftrapping-math: Assume FP operations may trap</font><br>
<br>
<font size="2">clang accepts but (basically) ignores
those options. As a first<br>
step, it might make sense to have the FENV_ACCESS
default</font><br>
<font size="2">behavior triggered by these options, even
while the front end<br>
does not yet support the actual #pragma.</font><br>
<br>
<font size="2">The front end then needs to transmit the
information about<br>
FENV_ACCESS regions to later passes. However, I
believe that<br>
we do not actually have to implement "regions" as such
at the<br>
IR level. Instead, it would be sufficient to track the
follwing<br>
information:</font><br>
<br>
<font size="2">- For each FP operation, whether it is
within a FENV_ACCESS region.<br>
- For each call site, whether it is within a
FENV_ACCESS region.</font><br>
<br>
<font size="2">The former requires new IR support; the
approach currently under<br>
investigation uses the experimental "constrained FP"
intrinsics<br>
instead of traditional floating-point operations for
this. The<br>
latter can be done simply by annotating those call
sites with an<br>
attribute.</font><br>
<br>
<font size="2">In addition to that, the front-end itself
needs to disable any<br>
early optimizations that do not preserve strict FP
semantics,<br>
in particular it must not speculate FP operations if
they may<br>
trap. (Currently, the front end transforms "? :" on
floating-<br>
point types into a select IR statement; for trapping
FP<br>
operations, an explicit branch must be used instead.)</font><br>
<br>
<br>
<font size="2">B. LLVM IR and LLVM common optimizations</font><br>
<br>
<font size="2">As mentioned in the previous section, we
need some IR to annotate<br>
FP instructions and call sites within FENV_ACCESS
regions. All<br>
common optimizations then need to respect the strict
FP semantics<br>
associated with those regions.</font><br>
<br>
<font size="2">The current approach uses experimental
intrinsics. This has the<br>
advantage that most optimizations never trigger since
they don't<br>
even recognize those new intrinsics. Also, the
intrinsics can<br>
be marked as having side-effects and/or being
non-speculatable.</font><br>
<br>
<font size="2">The overall effect is that more
optimizations are suppressed<br>
than would be strictly necessary. But this may still
be a good<br>
first step, since the result is now safe but maybe not
optimal<br>
-- which can be improved upon over time by teaching
the specific<br>
semantics of those intrinsics to optimization passes.</font><br>
<br>
<font size="2">However, some open questions remain. If
at some point we want<br>
to model the constrained FP semantics more precisely
than just<br>
as "unmodeled side effects", this may have to be
reflected at<br>
the IR level directly. For example, to model rounding
mode<br>
behavior, at some point we might require explicit
tracking of<br>
data dependencies on the rounding mode by representing
the<br>
rounding mode as SSA values defined by function calls
and used<br>
by FP intrinsics. Similarly, to track exception status
flags,<br>
they might be modeled as SSA values set by FP
intrinsics and<br>
used by function calls.</font><br>
<br>
<font size="2">(There is a possibly related question of
how to optimally model<br>
the property of many math library routines that they
may access<br>
the "errno" variable but no other memory ... It might
also be<br>
possible to model e.g. exception status as a
thread-local "memory"<br>
location that is modified by FP operations, just like
errno.)</font><br>
<br>
<font size="2">Another currently unresolved issue is
that at the moment nothing<br>
prevents *standard* floating-point operations from
being moved<br>
*inside* FENV_ACCESS regions. This may also be
invalid, since<br>
those operations now may cause unexpected traps etc.
(More<br>
specifically, what is invalid is moving any standard
FP operation<br>
across a *call site* within a FENV_ACCESS region.)
Note that<br>
this is even an issue if we only support changing the
default<br>
(and no actual #pragma) if mutiple object files using
different<br>
default settings are being linked together using LTO.</font><br>
<br>
<font size="2">This last issue could in theory be solved
by having all optimization<br>
passes respect the requirement that floating-point
operations may<br>
not be moved across call sites marked with the strict
FP attribute.<br>
But that does not appear to be straightforward since
it would<br>
introduce a "new" type of dependeny that would have to
be added<br>
throughout LLVM code. If this must be avoided, we'd
have to<br>
find a way to explicity track dependencies at the IR
level. In<br>
the extreme, this could end up equivalent to just
always using<br>
the constrained intrinsics for everything ...</font><br>
<br>
<br>
<font size="2">C. Code generation</font><br>
<br>
<font size="2">In the back end, effects of strict FP
mode have to passed through<br>
to lower-level representations including SelectionDAG
and MI.</font><br>
<br>
<font size="2">Currently, the "unmodeled side effect"
logic of the constrained<br>
intrinsics is modeled by putting them on the chain
during SelectionDAG.<br>
(If we ever model semantics more precisely at the IR
level, that<br>
would need to be reflected on SelectionDAG
accordingly.)</font><br>
<br>
<font size="2">At the MI level, there is no
representation at all. One option to<br>
fix this would be to model target-specific registers
that implement<br>
the IEEE semantics. Most platforms have registers (or
parts of<br>
registers) that hold:<br>
- the current rounding mode<br>
- the exception status flags<br>
- the exception masks (which enable traps)<br>
Marking FP instructions as using and/or defining these
registers<br>
would enforce ordering requirements. It may be too
strict in some<br>
cases (e.g. two instructions setting exception status
flags may<br>
still be reordered). On the other hand, I believe if
instructions<br>
may actually *trap*, we actually need the
hasSideEffects flag even<br>
if register dependencies are modeled.</font><br>
<br>
<font size="2">If we do need hasSideEffects, there is a
separate discussion on<br>
whether this can be implemented without each back end
having to<br>
duplicate all FP instruction patterns (one with
hasSideEffects<br>
and one without), e.g. by having a new feature that
allows to<br>
describe the side-effect status using an MI operand.</font><br>
<br>
<br>
<font size="2">Next steps<br>
==========</font><br>
<br>
<font size="2">I believe it is important to break up the
full amount of work<br>
into incremental steps that provide some useful
benefits on their<br>
own. At first, we should be able to get to a state
where clang<br>
can be used to build programs that use some (maybe not
all) strict<br>
FP features, where the generated code is always
correct but may<br>
not always be optimal. To get there, I think we need
at a <br>
minimum:</font><br>
<br>
<font size="2">- Implement clang support for the default
flags, e.g. GCC's<br>
-frounding-math and -ftrapping-math, and generate
always<br>
the constrained intrinsics. clang should also mark all<br>
call sites then (as mentioned above).</font><br>
<br>
<font size="2">- For now, add the requirement that LTO
is not supported if<br>
this would cause mixing of strict and non-strict FP
code.<br>
In the alternative, have the LTO pass automatically
transform<br>
and floating-point operation into a constrained
intrinsic<br>
if *any* (other) module already uses the latter.</font><br>
<br>
<font size="2">- At the IR level, complete the set of
supported constrained<br>
FP intrinsics (there are still some missing, see e.g <br>
</font><font size="2"><a href="https://reviews.llvm.org/D43515" target="_blank">https://reviews.llvm.org/D4351<wbr>5</a></font><font size="2">).<br>
Also, it seems not all variants (e.g. for vector
types) are<br>
supported correctly through codegen (see e.g.<br>
</font><font size="2"><a href="https://reviews.llvm.org/D46967" target="_blank">https://reviews.llvm.org/D4696<wbr>7</a></font><font size="2">).</font><br>
<br>
<font size="2">- Allow targets to correctly reflect
constrained intrinsics<br>
semantics at the MI level and final machine code
generation<br>
(see e.g. </font><font size="2"><a href="https://reviews.llvm.org/D45576" target="_blank">https://reviews.llvm.org/D4557<wbr>6</a></font><font size="2">).</font><br>
<br>
<font size="2">- Review all optimization and codegen
passes to verify they<br>
fully respect strict FP semantics.</font><br>
<br>
<font size="2">Once this is done, we can improve on the
solution by:</font><br>
<br>
<font size="2">- Supporting mixing strict and non-strict
FP operations<br>
(would lift the LTO restriction). (Note: there seems<br>
to be still some "invention required" here, see
above.)</font><br>
<br>
<font size="2">- Actually implementing the #pragma
supporting different<br>
regions within a compilation unit (prereq: support for<br>
mixing strict and non-strict FP operations).</font><br>
<br>
<font size="2">- Add more optimization of constrained FP
intrinsics in<br>
common optimizers and/or target back ends.</font><br>
<br>
<font size="2">Does this look reasonable? Please let me
know if there's<br>
anything I overlooked, or you have any additional
comments<br>
or questions.</font><br>
<br>
<br>
<font size="2"><br>
Mit freundlichen Gruessen / Best Regards<span class="gmail-m_-1433965244057454815HOEnZb"><font color="#888888"><br>
<br>
Ulrich Weigand<br>
<br>
-- <br>
Dr. Ulrich Weigand | Phone: +49-7031/16-3727<br>
STSM, GNU/Linux compilers and toolchain<br>
IBM Deutschland Research & Development GmbH<br>
Vorsitzende des Aufsichtsrats: Martina Koederitz |
Geschäftsführung: Dirk Wittkopp<br>
Sitz der Gesellschaft: Böblingen |
Registergericht: Amtsgericht Stuttgart, HRB 243294</font></span></font><br>
</p>
</div>
<br>
______________________________<wbr>_________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
<br>
</blockquote>
</div>
<br>
</div>
<br>
<fieldset class="gmail-m_-1433965244057454815mimeAttachmentHeader"></fieldset>
<br>
<pre>______________________________<wbr>_________________
LLVM Developers mailing list
<a class="gmail-m_-1433965244057454815moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>
<a class="gmail-m_-1433965244057454815moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a>
</pre>
</blockquote>
<br>
</div></div><span class="gmail-HOEnZb"><font color="#888888"><pre class="gmail-m_-1433965244057454815moz-signature" cols="72">--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
</font></span></div>
</blockquote></div><br></div></div>