<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p><br>
</p>
<div class="moz-cite-prefix">On 05/23/2018 04:04 PM, Hubert Tong
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CACvkUqYmOjHVhJXMYYZ9A7CYUCJ_Qok+9GjtD8w+WeiJTA77ag@mail.gmail.com">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">On Wed, May 23, 2018 at 12:19 PM, Hal
Finkel <span dir="ltr"><<a href="mailto:hfinkel@anl.gov"
target="_blank" moz-do-not-send="true">hfinkel@anl.gov</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"><span class="gmail-">
<p><br>
</p>
<div
class="gmail-m_-1433965244057454815moz-cite-prefix">On
05/23/2018 11:06 AM, Hubert Tong via llvm-dev wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>Hi Ulrich,<br>
<br>
</div>
<div>I am interested in knowing if the current
proposals also take into account the FP_CONTRACT
pragma</div>
</div>
</blockquote>
<br>
</span> We should already do this (we turn relevant
operations into the @llvm.fmuladd. when FP_CONTRACT is
set to on during IR generation).<span class="gmail-"><br>
</span></div>
</blockquote>
<div>I am not sure we have the same interpretation of what
the FP_CONTRACT pragma does. Subclause 6.5 paragraph 8 of
C11 implies (for example) that even where the FENV_ACCESS
pragma is "on", folding a constant subexpression with an
exactly representable result on an implementation where
FLT_EVAL_METHOD is 0 is within the range of acceptable
implementation-defined behaviour despite intermediate
overflow under non-contracted evaluation. Which is to say
that the current proposal reads as what needs to be done
when FP_CONTRACT is "off" and FENV_ACCESS is "on". The
note from Ulrich implies that the requirements are imposed
by the Standard, but the range of implementation defined
behaviour where FP_CONTRACT is "on" where FENV_ACCESS is
also "on" is possibly a discussion to be had.<br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
Thanks for explaining. Yes, I agree, this is certainly worth
discussing. Do you have thoughts on what we should do? I think it
makes sense to fold where possible, as the user has requested the
extra intermediate precision available from FMA formation.<br>
<br>
Also, to what extent can we change our minds later? For example,
with C++/constexpr, etc. does this have ABI implications?<br>
<br>
<blockquote type="cite"
cite="mid:CACvkUqYmOjHVhJXMYYZ9A7CYUCJ_Qok+9GjtD8w+WeiJTA77ag@mail.gmail.com">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"><span class="gmail-"> <br>
<blockquote type="cite">
<div dir="ltr">
<div> and the ability to implement options that
imply a specific value for the FLT_EVAL_METHOD
macro.<br>
</div>
</div>
</blockquote>
<br>
</span> What do you mean by this?<br>
</div>
</blockquote>
<div>I admit that modes where FLT_EVAL_METHOD, respectively,
is 0 (no extra range and precision), 1 (float in double
range and precision), and 2 (float and double in long
double range and precision) are all straightforward for
the IR producer to implement by fixing the types used in
the IR emitted (implying the value FLT_EVAL_METHOD is not
constant within a program).<br>
<br>
So, this is more about implementing meaningful cases of
FLT_EVAL_METHOD being -1. My point below (in my previous
note) is that allowing IR passes or the back-end to choose
the range and precision in a manner conforming to Standard
C (for a FLT_EVAL_METHOD of -1)--perhaps for speed where
multiple sets of floating-point operations/registers are
available with differing "preferred types"--appears to be
a use case that the IR does not seem to support well.</div>
</div>
</div>
</div>
</blockquote>
<br>
Yes. In the LangRef we do have fpmath metadata
(<a class="moz-txt-link-freetext" href="http://llvm.org/docs/LangRef.html#fpmath-metadata">http://llvm.org/docs/LangRef.html#fpmath-metadata</a>), which might be
useful in this space, but I don't think we actually use it for
anything.<br>
<br>
<blockquote type="cite"
cite="mid:CACvkUqYmOjHVhJXMYYZ9A7CYUCJ_Qok+9GjtD8w+WeiJTA77ag@mail.gmail.com">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> As for why a FLT_EVAL_METHOD of -1 is on-topic for
this thread: The language semantics allow the case of the
constant subexpression folding I mentioned above even when
FP_CONTRACT is "off" and FENV_ACCESS is "on", because the
evaluation format used for the evaluation of that
subexpression can be said to have infinite range and
precision.<br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
An, interesting. FLT_EVAL_METHOD is a constant chosen (globally) by
the implementation, correct? Do you know of platforms that set
FLT_EVAL_METHOD to -1?<br>
<br>
-Hal<br>
<br>
<blockquote type="cite"
cite="mid:CACvkUqYmOjHVhJXMYYZ9A7CYUCJ_Qok+9GjtD8w+WeiJTA77ag@mail.gmail.com">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"> <br>
-Hal
<div>
<div class="gmail-h5"><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div><br>
</div>
<div>Additionally, I am not aware of the IR
being able to represent the potentially
deferred loss of precision that the C language
semantics provide; in particular, applying
such semantics to the existing IR would hit an
issue that the limits of such deferment would
need an agreed representation.<br>
<br>
</div>
<div>As for the mixing of strict and non-strict
modes, I would be interested in where LLVM is
in its handling of non-SSA (pseudo-memory?)
dependencies. I have a vague impression that
it is very coarse-grained in that respect, but
I admit to not being particularly informed in
that space. If there is a good model for such
dependencies, then I think it could be used to
handle the strict/non-strict mixing.<br>
</div>
<div><br>
</div>
-- Hubert Tong, IBM<br>
<br>
<div>PS A nitpick on wording: The idea of being
inside or outside of FENV_ACCESS regions is
instead be expressed in terms of the state of
the FENV_ACCESS pragma within the C Standard.<br>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Wed, May 23, 2018 at
10:48 AM, Ulrich Weigand via llvm-dev <span
dir="ltr"><<a
href="mailto:llvm-dev@lists.llvm.org"
target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p><font size="2">Hello,</font><br>
<br>
<font size="2">at the recent EuroLLVM
developer meeting in Bristol I held a
BoF<br>
session on the topic "Towards
implementing #pragma STDC
FENV_ACCESS".<br>
I've also had a number of follow-on
discussions both on-site in<br>
Bristol and online since. This post is
intended as a summary of<br>
my current understanding set of
requirements and implementation<br>
details covering the overall topic.</font><br>
<br>
<font size="2">I'm posting this here in
the hope this can serve as a basis for<br>
the various more detailed discussions
that are still ongoing<br>
(e.g. in various Phabricator proposals
right now). Any comments<br>
are welcome!</font><br>
<br>
<br>
<font size="2">Semantics of #pragma STDC
FENV_ACCESS<br>
==============================<wbr>=======</font><br>
<br>
<font size="2">To provide a baseline for
the implementation discussion, first
an<br>
overview of the features required to
handle the strict floating-point<br>
mode defined by the C and IEEE
standard:</font><br>
<br>
<font size="2">1. Floating-point
rounding modes<br>
2. Default floating-point exception
handling<br>
3. Trapping floating-point exception
handling</font><br>
<br>
<font size="2">Each of these separate
features imposes different constraints
on the<br>
optimizations that LLVM may perform
involving FP expressions:</font><br>
<br>
<font size="2">1. Floating-point
rounding modes</font><br>
<br>
<font size="2">Outside of FENV_ACCESS
regions, all FP operations are
supposed to be<br>
performed in the "default" rounding
mode.</font><br>
<br>
<font size="2">But inside FENV_ACCESS
regions, FP operations implicitly
depend on<br>
a "current" rounding mode setting,
which may be changed by certain<br>
C library calls (plus some
platform-specific intrinsics). In
addition,<br>
those calls may be performed within
subroutines (as long as those are<br>
also within FENV_ACCESS), so *any*
function call within a FENV_ACCESS<br>
must be considered as potentially
changing the rounding mode.</font><br>
<br>
<font size="2">In effect, this means the
compiler may not move or combine FP<br>
operations accross function call
sites.</font><br>
<br>
<font size="2">2. Default floating-point
exception handling</font><br>
<br>
<font size="2">Inside FENV_ACCESS
regions, every floating-point
operation that<br>
causes an exception must be considered
to set a "status flag"<br>
associated with this exception type.
Those flags can be queried<br>
using C library calls (plus some
platform-specific intrinsics),<br>
and there are other such calls to
explicitly set or clear those<br>
flags as well. As with the rounding
modes, those calls may be<br>
performed in subroutines as well, so
any function call within a<br>
FENV_ACCESS region must be considered
as potentially *using* and<br>
changing the floating-point exception
status flags.</font><br>
<br>
<font size="2">The values of the status
flags on entry to a FENV_ACCESS are to<br>
be considered undefined according to
the C standard.</font><br>
<br>
<font size="2">Compiler optimizations
are supposed to preserve the values of<br>
all exception status bits at any point
where they can be<br>
(potentially) inspected by the
program, i.e. at all call sites<br>
within FENV_ACCESS regions. This still
allows a number of<br>
optimizations, e.g. to reorder FP
operations or combine two<br>
identical operations within a region
uninterrupted by calls.<br>
But other optimizations should be
avoided, e.g. optimizing<br>
away an unused FP operation may result
in an exception flag<br>
now being unset that would otherwise
have been set. The same<br>
applies to floating-point constant
folding.</font><br>
<br>
<font size="2">3. Trapping
floating-point exception handling</font><br>
<br>
<font size="2">Within a FENV_ACCESS
region, library calls may be used to
switch<br>
exception handling semantics to a
"trapping" mode by setting<br>
corresponding mask bits. Any
subsequent FP instruction that<br>
raises an exception with the
associated mask bit set will cause<br>
a trap. Usually, this will be a
hardware trap that is translated<br>
by the operating system into some form
of software exception that<br>
can by handled by the applcation; on
Linux systems this takes the<br>
form of a SIGFPE signal.</font><br>
<br>
<font size="2">As above, those mask bits
can be set and reset via (operating-<br>
system specific) library calls and/or
platform-specific intrinsics,<br>
all of which may also be done within
subroutine calls.</font><br>
<br>
<font size="2">In effect, this requires
the compiler to treat any
floating-point<br>
operation within a FENV_ACCESS region
as potentially trapping,<br>
which means the same restrictions
apply as with e.g. memory accesses<br>
(cannot be speculated etc.) However,
according to the C standard,<br>
the implementation is not required to
preserve the *number* of<br>
different traps, so identical
operations may still be combined<br>
(unless there is an intervening
function call).</font><br>
<br>
<font size="2">The C standard requires
all user code to explicitly switch
back<br>
to non-trapping mode for all
exceptions whenever leaving a<br>
FENV_ACCESS region (both by "falling
off the end" of the region<br>
and by calling a subroutine defined
outside of FENV_ACCESS).</font><br>
<br>
<br>
<font size="2">Implementation
requirements on parts of the compiler<br>
==============================<wbr>======================</font><br>
<br>
<font size="2">A. clang front end</font><br>
<br>
<font size="2">The front end needs to
determine which instructions are part
of<br>
FENV_ACCESS regions and which are not.
This takes into account<br>
both the semantics of the #pragma as
defined by the standard,<br>
and the implementation-defined default
rules that apply to code<br>
outside of any #pragma. GCC currently
has the following two<br>
related command-line options:</font><br>
<br>
<font size="2">-frounding-math: Do not
assume default rounding mode<br>
-ftrapping-math: Assume FP operations
may trap</font><br>
<br>
<font size="2">clang accepts but
(basically) ignores those options. As
a first<br>
step, it might make sense to have the
FENV_ACCESS default</font><br>
<font size="2">behavior triggered by
these options, even while the front
end<br>
does not yet support the actual
#pragma.</font><br>
<br>
<font size="2">The front end then needs
to transmit the information about<br>
FENV_ACCESS regions to later passes.
However, I believe that<br>
we do not actually have to implement
"regions" as such at the<br>
IR level. Instead, it would be
sufficient to track the follwing<br>
information:</font><br>
<br>
<font size="2">- For each FP operation,
whether it is within a FENV_ACCESS
region.<br>
- For each call site, whether it is
within a FENV_ACCESS region.</font><br>
<br>
<font size="2">The former requires new
IR support; the approach currently
under<br>
investigation uses the experimental
"constrained FP" intrinsics<br>
instead of traditional floating-point
operations for this. The<br>
latter can be done simply by
annotating those call sites with an<br>
attribute.</font><br>
<br>
<font size="2">In addition to that, the
front-end itself needs to disable any<br>
early optimizations that do not
preserve strict FP semantics,<br>
in particular it must not speculate FP
operations if they may<br>
trap. (Currently, the front end
transforms "? :" on floating-<br>
point types into a select IR
statement; for trapping FP<br>
operations, an explicit branch must be
used instead.)</font><br>
<br>
<br>
<font size="2">B. LLVM IR and LLVM
common optimizations</font><br>
<br>
<font size="2">As mentioned in the
previous section, we need some IR to
annotate<br>
FP instructions and call sites within
FENV_ACCESS regions. All<br>
common optimizations then need to
respect the strict FP semantics<br>
associated with those regions.</font><br>
<br>
<font size="2">The current approach uses
experimental intrinsics. This has the<br>
advantage that most optimizations
never trigger since they don't<br>
even recognize those new intrinsics.
Also, the intrinsics can<br>
be marked as having side-effects
and/or being non-speculatable.</font><br>
<br>
<font size="2">The overall effect is
that more optimizations are suppressed<br>
than would be strictly necessary. But
this may still be a good<br>
first step, since the result is now
safe but maybe not optimal<br>
-- which can be improved upon over
time by teaching the specific<br>
semantics of those intrinsics to
optimization passes.</font><br>
<br>
<font size="2">However, some open
questions remain. If at some point we
want<br>
to model the constrained FP semantics
more precisely than just<br>
as "unmodeled side effects", this may
have to be reflected at<br>
the IR level directly. For example, to
model rounding mode<br>
behavior, at some point we might
require explicit tracking of<br>
data dependencies on the rounding mode
by representing the<br>
rounding mode as SSA values defined by
function calls and used<br>
by FP intrinsics. Similarly, to track
exception status flags,<br>
they might be modeled as SSA values
set by FP intrinsics and<br>
used by function calls.</font><br>
<br>
<font size="2">(There is a possibly
related question of how to optimally
model<br>
the property of many math library
routines that they may access<br>
the "errno" variable but no other
memory ... It might also be<br>
possible to model e.g. exception
status as a thread-local "memory"<br>
location that is modified by FP
operations, just like errno.)</font><br>
<br>
<font size="2">Another currently
unresolved issue is that at the moment
nothing<br>
prevents *standard* floating-point
operations from being moved<br>
*inside* FENV_ACCESS regions. This may
also be invalid, since<br>
those operations now may cause
unexpected traps etc. (More<br>
specifically, what is invalid is
moving any standard FP operation<br>
across a *call site* within a
FENV_ACCESS region.) Note that<br>
this is even an issue if we only
support changing the default<br>
(and no actual #pragma) if mutiple
object files using different<br>
default settings are being linked
together using LTO.</font><br>
<br>
<font size="2">This last issue could in
theory be solved by having all
optimization<br>
passes respect the requirement that
floating-point operations may<br>
not be moved across call sites marked
with the strict FP attribute.<br>
But that does not appear to be
straightforward since it would<br>
introduce a "new" type of dependeny
that would have to be added<br>
throughout LLVM code. If this must be
avoided, we'd have to<br>
find a way to explicity track
dependencies at the IR level. In<br>
the extreme, this could end up
equivalent to just always using<br>
the constrained intrinsics for
everything ...</font><br>
<br>
<br>
<font size="2">C. Code generation</font><br>
<br>
<font size="2">In the back end, effects
of strict FP mode have to passed
through<br>
to lower-level representations
including SelectionDAG and MI.</font><br>
<br>
<font size="2">Currently, the "unmodeled
side effect" logic of the constrained<br>
intrinsics is modeled by putting them
on the chain during SelectionDAG.<br>
(If we ever model semantics more
precisely at the IR level, that<br>
would need to be reflected on
SelectionDAG accordingly.)</font><br>
<br>
<font size="2">At the MI level, there is
no representation at all. One option
to<br>
fix this would be to model
target-specific registers that
implement<br>
the IEEE semantics. Most platforms
have registers (or parts of<br>
registers) that hold:<br>
- the current rounding mode<br>
- the exception status flags<br>
- the exception masks (which enable
traps)<br>
Marking FP instructions as using
and/or defining these registers<br>
would enforce ordering requirements.
It may be too strict in some<br>
cases (e.g. two instructions setting
exception status flags may<br>
still be reordered). On the other
hand, I believe if instructions<br>
may actually *trap*, we actually need
the hasSideEffects flag even<br>
if register dependencies are modeled.</font><br>
<br>
<font size="2">If we do need
hasSideEffects, there is a separate
discussion on<br>
whether this can be implemented
without each back end having to<br>
duplicate all FP instruction patterns
(one with hasSideEffects<br>
and one without), e.g. by having a new
feature that allows to<br>
describe the side-effect status using
an MI operand.</font><br>
<br>
<br>
<font size="2">Next steps<br>
==========</font><br>
<br>
<font size="2">I believe it is important
to break up the full amount of work<br>
into incremental steps that provide
some useful benefits on their<br>
own. At first, we should be able to
get to a state where clang<br>
can be used to build programs that use
some (maybe not all) strict<br>
FP features, where the generated code
is always correct but may<br>
not always be optimal. To get there, I
think we need at a <br>
minimum:</font><br>
<br>
<font size="2">- Implement clang support
for the default flags, e.g. GCC's<br>
-frounding-math and -ftrapping-math,
and generate always<br>
the constrained intrinsics. clang
should also mark all<br>
call sites then (as mentioned above).</font><br>
<br>
<font size="2">- For now, add the
requirement that LTO is not supported
if<br>
this would cause mixing of strict and
non-strict FP code.<br>
In the alternative, have the LTO pass
automatically transform<br>
and floating-point operation into a
constrained intrinsic<br>
if *any* (other) module already uses
the latter.</font><br>
<br>
<font size="2">- At the IR level,
complete the set of supported
constrained<br>
FP intrinsics (there are still some
missing, see e.g <br>
</font><font size="2"><a
href="https://reviews.llvm.org/D43515"
target="_blank"
moz-do-not-send="true">https://reviews.llvm.org/D4351<wbr>5</a></font><font
size="2">).<br>
Also, it seems not all variants (e.g.
for vector types) are<br>
supported correctly through codegen
(see e.g.<br>
</font><font size="2"><a
href="https://reviews.llvm.org/D46967"
target="_blank"
moz-do-not-send="true">https://reviews.llvm.org/D4696<wbr>7</a></font><font
size="2">).</font><br>
<br>
<font size="2">- Allow targets to
correctly reflect constrained
intrinsics<br>
semantics at the MI level and final
machine code generation<br>
(see e.g. </font><font size="2"><a
href="https://reviews.llvm.org/D45576"
target="_blank"
moz-do-not-send="true">https://reviews.llvm.org/D4557<wbr>6</a></font><font
size="2">).</font><br>
<br>
<font size="2">- Review all optimization
and codegen passes to verify they<br>
fully respect strict FP semantics.</font><br>
<br>
<font size="2">Once this is done, we can
improve on the solution by:</font><br>
<br>
<font size="2">- Supporting mixing
strict and non-strict FP operations<br>
(would lift the LTO restriction).
(Note: there seems<br>
to be still some "invention required"
here, see above.)</font><br>
<br>
<font size="2">- Actually implementing
the #pragma supporting different<br>
regions within a compilation unit
(prereq: support for<br>
mixing strict and non-strict FP
operations).</font><br>
<br>
<font size="2">- Add more optimization
of constrained FP intrinsics in<br>
common optimizers and/or target back
ends.</font><br>
<br>
<font size="2">Does this look
reasonable? Please let me know if
there's<br>
anything I overlooked, or you have any
additional comments<br>
or questions.</font><br>
<br>
<br>
<font size="2"><br>
Mit freundlichen Gruessen / Best
Regards<span
class="gmail-m_-1433965244057454815HOEnZb"><font
color="#888888"><br>
<br>
Ulrich Weigand<br>
<br>
-- <br>
Dr. Ulrich Weigand | Phone:
+49-7031/16-3727<br>
STSM, GNU/Linux compilers and
toolchain<br>
IBM Deutschland Research &
Development GmbH<br>
Vorsitzende des Aufsichtsrats:
Martina Koederitz |
Geschäftsführung: Dirk Wittkopp<br>
Sitz der Gesellschaft: Böblingen |
Registergericht: Amtsgericht
Stuttgart, HRB 243294</font></span></font><br>
</p>
</div>
<br>
______________________________<wbr>_________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org"
target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>
<a
href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"
rel="noreferrer" target="_blank"
moz-do-not-send="true">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
<br>
</blockquote>
</div>
<br>
</div>
<br>
<fieldset
class="gmail-m_-1433965244057454815mimeAttachmentHeader"></fieldset>
<br>
<pre>______________________________<wbr>_________________
LLVM Developers mailing list
<a class="gmail-m_-1433965244057454815moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org" target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a>
<a class="gmail-m_-1433965244057454815moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank" moz-do-not-send="true">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a>
</pre>
</blockquote>
<br>
</div>
</div>
<span class="gmail-HOEnZb"><font color="#888888">
<pre class="gmail-m_-1433965244057454815moz-signature" cols="72">--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
</font></span></div>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
</body>
</html>