<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p><br>

    </p>

    <div class="moz-cite-prefix">On 05/23/2018 04:04 PM, Hubert Tong

      wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CACvkUqYmOjHVhJXMYYZ9A7CYUCJ_Qok+9GjtD8w+WeiJTA77ag@mail.gmail.com">

      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">On Wed, May 23, 2018 at 12:19 PM, Hal

            Finkel <span dir="ltr"><<a href="mailto:hfinkel@anl.gov"

                target="_blank" moz-do-not-send="true">hfinkel@anl.gov</a>></span>

            wrote:<br>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

              0.8ex;border-left:1px solid

              rgb(204,204,204);padding-left:1ex">

              <div bgcolor="#FFFFFF"><span class="gmail-">

                  <p><br>

                  </p>

                  <div

                    class="gmail-m_-1433965244057454815moz-cite-prefix">On

                    05/23/2018 11:06 AM, Hubert Tong via llvm-dev wrote:<br>

                  </div>

                  <blockquote type="cite">

                    <div dir="ltr">

                      <div>Hi Ulrich,<br>

                        <br>

                      </div>

                      <div>I am interested in knowing if the current

                        proposals also take into account the FP_CONTRACT

                        pragma</div>

                    </div>

                  </blockquote>

                  <br>

                </span> We should already do this (we turn relevant

                operations into the @llvm.fmuladd. when FP_CONTRACT is

                set to on during IR generation).<span class="gmail-"><br>

                </span></div>

            </blockquote>

            <div>I am not sure we have the same interpretation of what

              the FP_CONTRACT pragma does. Subclause 6.5 paragraph 8 of

              C11 implies (for example) that even where the FENV_ACCESS

              pragma is "on", folding a constant subexpression with an

              exactly representable result on an implementation where

              FLT_EVAL_METHOD is 0 is within the range of acceptable

              implementation-defined behaviour despite intermediate

              overflow under non-contracted evaluation. Which is to say

              that the current proposal reads as what needs to be done

              when FP_CONTRACT is "off" and FENV_ACCESS is "on". The

              note from Ulrich implies that the requirements are imposed

              by the Standard, but the range of implementation defined

              behaviour where FP_CONTRACT is "on" where FENV_ACCESS is

              also "on" is possibly a discussion to be had.<br>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    Thanks for explaining. Yes, I agree, this is certainly worth

    discussing. Do you have thoughts on what we should do? I think it

    makes sense to fold where possible, as the user has requested the

    extra intermediate precision available from FMA formation.<br>

    <br>

    Also, to what extent can we change our minds later? For example,

    with C++/constexpr, etc. does this have ABI implications?<br>

    <br>

    <blockquote type="cite"

cite="mid:CACvkUqYmOjHVhJXMYYZ9A7CYUCJ_Qok+9GjtD8w+WeiJTA77ag@mail.gmail.com">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div><br>

            </div>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

              0.8ex;border-left:1px solid

              rgb(204,204,204);padding-left:1ex">

              <div bgcolor="#FFFFFF"><span class="gmail-"> <br>

                  <blockquote type="cite">

                    <div dir="ltr">

                      <div> and the ability to implement options that

                        imply a specific value for the FLT_EVAL_METHOD

                        macro.<br>

                      </div>

                    </div>

                  </blockquote>

                  <br>

                </span> What do you mean by this?<br>

              </div>

            </blockquote>

            <div>I admit that modes where FLT_EVAL_METHOD, respectively,

              is 0 (no extra range and precision), 1 (float in double

              range and precision), and 2 (float and double in long

              double range and precision) are all straightforward for

              the IR producer to implement by fixing the types used in

              the IR emitted (implying the value FLT_EVAL_METHOD is not

              constant within a program).<br>

              <br>

              So, this is more about implementing meaningful cases of

              FLT_EVAL_METHOD being -1. My point below (in my previous

              note) is that allowing IR passes or the back-end to choose

              the range and precision in a manner conforming to Standard

              C (for a FLT_EVAL_METHOD of -1)--perhaps for speed where

              multiple sets of floating-point operations/registers are

              available with differing "preferred types"--appears to be

              a use case that the IR does not seem to support well.</div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    Yes. In the LangRef we do have fpmath metadata

    (<a class="moz-txt-link-freetext" href="http://llvm.org/docs/LangRef.html#fpmath-metadata">http://llvm.org/docs/LangRef.html#fpmath-metadata</a>), which might be

    useful in this space, but I don't think we actually use it for

    anything.<br>

    <br>

    <blockquote type="cite"

cite="mid:CACvkUqYmOjHVhJXMYYZ9A7CYUCJ_Qok+9GjtD8w+WeiJTA77ag@mail.gmail.com">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div> As for why a FLT_EVAL_METHOD of -1 is on-topic for

              this thread: The language semantics allow the case of the

              constant subexpression folding I mentioned above even when

              FP_CONTRACT is "off" and FENV_ACCESS is "on", because the

              evaluation format used for the evaluation of that

              subexpression can be said to have infinite range and

              precision.<br>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    An, interesting. FLT_EVAL_METHOD is a constant chosen (globally) by

    the implementation, correct? Do you know of platforms that set

    FLT_EVAL_METHOD to -1?<br>

    <br>

     -Hal<br>

    <br>

    <blockquote type="cite"

cite="mid:CACvkUqYmOjHVhJXMYYZ9A7CYUCJ_Qok+9GjtD8w+WeiJTA77ag@mail.gmail.com">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div><br>

            </div>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

              0.8ex;border-left:1px solid

              rgb(204,204,204);padding-left:1ex">

              <div bgcolor="#FFFFFF"> <br>

                 -Hal

                <div>

                  <div class="gmail-h5"><br>

                    <br>

                    <blockquote type="cite">

                      <div dir="ltr">

                        <div><br>

                        </div>

                        <div>Additionally, I am not aware of the IR

                          being able to represent the potentially

                          deferred loss of precision that the C language

                          semantics provide; in particular, applying

                          such semantics to the existing IR would hit an

                          issue that the limits of such deferment would

                          need an agreed representation.<br>

                          <br>

                        </div>

                        <div>As for the mixing of strict and non-strict

                          modes, I would be interested in where LLVM is

                          in its handling of non-SSA (pseudo-memory?)

                          dependencies. I have a vague impression that

                          it is very coarse-grained in that respect, but

                          I admit to not being particularly informed in

                          that space. If there is a good model for such

                          dependencies, then I think it could be used to

                          handle the strict/non-strict mixing.<br>

                        </div>

                        <div><br>

                        </div>

                        -- Hubert Tong, IBM<br>

                        <br>

                        <div>PS A nitpick on wording: The idea of being

                          inside or outside of FENV_ACCESS regions is

                          instead be expressed in terms of the state of

                          the FENV_ACCESS pragma within the C Standard.<br>

                        </div>

                      </div>

                      <div class="gmail_extra"><br>

                        <div class="gmail_quote">On Wed, May 23, 2018 at

                          10:48 AM, Ulrich Weigand via llvm-dev <span

                            dir="ltr"><<a

                              href="mailto:llvm-dev@lists.llvm.org"

                              target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a>></span>

                          wrote:<br>

                          <blockquote class="gmail_quote"

                            style="margin:0px 0px 0px

                            0.8ex;border-left:1px solid

                            rgb(204,204,204);padding-left:1ex">

                            <div>

                              <p><font size="2">Hello,</font><br>

                                <br>

                                <font size="2">at the recent EuroLLVM

                                  developer meeting in Bristol I held a

                                  BoF<br>

                                  session on the topic "Towards

                                  implementing #pragma STDC

                                  FENV_ACCESS".<br>

                                  I've also had a number of follow-on

                                  discussions both on-site in<br>

                                  Bristol and online since. This post is

                                  intended as a summary of<br>

                                  my current understanding set of

                                  requirements and implementation<br>

                                  details covering the overall topic.</font><br>

                                <br>

                                <font size="2">I'm posting this here in

                                  the hope this can serve as a basis for<br>

                                  the various more detailed discussions

                                  that are still ongoing<br>

                                  (e.g. in various Phabricator proposals

                                  right now). Any comments<br>

                                  are welcome!</font><br>

                                <br>

                                <br>

                                <font size="2">Semantics of #pragma STDC

                                  FENV_ACCESS<br>

                                  ==============================<wbr>=======</font><br>

                                <br>

                                <font size="2">To provide a baseline for

                                  the implementation discussion, first

                                  an<br>

                                  overview of the features required to

                                  handle the strict floating-point<br>

                                  mode defined by the C and IEEE

                                  standard:</font><br>

                                <br>

                                <font size="2">1. Floating-point

                                  rounding modes<br>

                                  2. Default floating-point exception

                                  handling<br>

                                  3. Trapping floating-point exception

                                  handling</font><br>

                                <br>

                                <font size="2">Each of these separate

                                  features imposes different constraints

                                  on the<br>

                                  optimizations that LLVM may perform

                                  involving FP expressions:</font><br>

                                <br>

                                <font size="2">1. Floating-point

                                  rounding modes</font><br>

                                <br>

                                <font size="2">Outside of FENV_ACCESS

                                  regions, all FP operations are

                                  supposed to be<br>

                                  performed in the "default" rounding

                                  mode.</font><br>

                                <br>

                                <font size="2">But inside FENV_ACCESS

                                  regions, FP operations implicitly

                                  depend on<br>

                                  a "current" rounding mode setting,

                                  which may be changed by certain<br>

                                  C library calls (plus some

                                  platform-specific intrinsics). In

                                  addition,<br>

                                  those calls may be performed within

                                  subroutines (as long as those are<br>

                                  also within FENV_ACCESS), so *any*

                                  function call within a FENV_ACCESS<br>

                                  must be considered as potentially

                                  changing the rounding mode.</font><br>

                                <br>

                                <font size="2">In effect, this means the

                                  compiler may not move or combine FP<br>

                                  operations accross function call

                                  sites.</font><br>

                                <br>

                                <font size="2">2. Default floating-point

                                  exception handling</font><br>

                                <br>

                                <font size="2">Inside FENV_ACCESS

                                  regions, every floating-point

                                  operation that<br>

                                  causes an exception must be considered

                                  to set a "status flag"<br>

                                  associated with this exception type.

                                  Those flags can be queried<br>

                                  using C library calls (plus some

                                  platform-specific intrinsics),<br>

                                  and there are other such calls to

                                  explicitly set or clear those<br>

                                  flags as well. As with the rounding

                                  modes, those calls may be<br>

                                  performed in subroutines as well, so

                                  any function call within a<br>

                                  FENV_ACCESS region must be considered

                                  as potentially *using* and<br>

                                  changing the floating-point exception

                                  status flags.</font><br>

                                <br>

                                <font size="2">The values of the status

                                  flags on entry to a FENV_ACCESS are to<br>

                                  be considered undefined according to

                                  the C standard.</font><br>

                                <br>

                                <font size="2">Compiler optimizations

                                  are supposed to preserve the values of<br>

                                  all exception status bits at any point

                                  where they can be<br>

                                  (potentially) inspected by the

                                  program, i.e. at all call sites<br>

                                  within FENV_ACCESS regions. This still

                                  allows a number of<br>

                                  optimizations, e.g. to reorder FP

                                  operations or combine two<br>

                                  identical operations within a region

                                  uninterrupted by calls.<br>

                                  But other optimizations should be

                                  avoided, e.g. optimizing<br>

                                  away an unused FP operation may result

                                  in an exception flag<br>

                                  now being unset that would otherwise

                                  have been set. The same<br>

                                  applies to floating-point constant

                                  folding.</font><br>

                                <br>

                                <font size="2">3. Trapping

                                  floating-point exception handling</font><br>

                                <br>

                                <font size="2">Within a FENV_ACCESS

                                  region, library calls may be used to

                                  switch<br>

                                  exception handling semantics to a

                                  "trapping" mode by setting<br>

                                  corresponding mask bits. Any

                                  subsequent FP instruction that<br>

                                  raises an exception with the

                                  associated mask bit set will cause<br>

                                  a trap. Usually, this will be a

                                  hardware trap that is translated<br>

                                  by the operating system into some form

                                  of software exception that<br>

                                  can by handled by the applcation; on

                                  Linux systems this takes the<br>

                                  form of a SIGFPE signal.</font><br>

                                <br>

                                <font size="2">As above, those mask bits

                                  can be set and reset via (operating-<br>

                                  system specific) library calls and/or

                                  platform-specific intrinsics,<br>

                                  all of which may also be done within

                                  subroutine calls.</font><br>

                                <br>

                                <font size="2">In effect, this requires

                                  the compiler to treat any

                                  floating-point<br>

                                  operation within a FENV_ACCESS region

                                  as potentially trapping,<br>

                                  which means the same restrictions

                                  apply as with e.g. memory accesses<br>

                                  (cannot be speculated etc.) However,

                                  according to the C standard,<br>

                                  the implementation is not required to

                                  preserve the *number* of<br>

                                  different traps, so identical

                                  operations may still be combined<br>

                                  (unless there is an intervening

                                  function call).</font><br>

                                <br>

                                <font size="2">The C standard requires

                                  all user code to explicitly switch

                                  back<br>

                                  to non-trapping mode for all

                                  exceptions whenever leaving a<br>

                                  FENV_ACCESS region (both by "falling

                                  off the end" of the region<br>

                                  and by calling a subroutine defined

                                  outside of FENV_ACCESS).</font><br>

                                <br>

                                <br>

                                <font size="2">Implementation

                                  requirements on parts of the compiler<br>

                                  ==============================<wbr>======================</font><br>

                                <br>

                                <font size="2">A. clang front end</font><br>

                                <br>

                                <font size="2">The front end needs to

                                  determine which instructions are part

                                  of<br>

                                  FENV_ACCESS regions and which are not.

                                  This takes into account<br>

                                  both the semantics of the #pragma as

                                  defined by the standard,<br>

                                  and the implementation-defined default

                                  rules that apply to code<br>

                                  outside of any #pragma. GCC currently

                                  has the following two<br>

                                  related command-line options:</font><br>

                                <br>

                                <font size="2">-frounding-math: Do not

                                  assume default rounding mode<br>

                                  -ftrapping-math: Assume FP operations

                                  may trap</font><br>

                                <br>

                                <font size="2">clang accepts but

                                  (basically) ignores those options. As

                                  a first<br>

                                  step, it might make sense to have the

                                  FENV_ACCESS default</font><br>

                                <font size="2">behavior triggered by

                                  these options, even while the front

                                  end<br>

                                  does not yet support the actual

                                  #pragma.</font><br>

                                <br>

                                <font size="2">The front end then needs

                                  to transmit the information about<br>

                                  FENV_ACCESS regions to later passes.

                                  However, I believe that<br>

                                  we do not actually have to implement

                                  "regions" as such at the<br>

                                  IR level. Instead, it would be

                                  sufficient to track the follwing<br>

                                  information:</font><br>

                                <br>

                                <font size="2">- For each FP operation,

                                  whether it is within a FENV_ACCESS

                                  region.<br>

                                  - For each call site, whether it is

                                  within a FENV_ACCESS region.</font><br>

                                <br>

                                <font size="2">The former requires new

                                  IR support; the approach currently

                                  under<br>

                                  investigation uses the experimental

                                  "constrained FP" intrinsics<br>

                                  instead of traditional floating-point

                                  operations for this. The<br>

                                  latter can be done simply by

                                  annotating those call sites with an<br>

                                  attribute.</font><br>

                                <br>

                                <font size="2">In addition to that, the

                                  front-end itself needs to disable any<br>

                                  early optimizations that do not

                                  preserve strict FP semantics,<br>

                                  in particular it must not speculate FP

                                  operations if they may<br>

                                  trap. (Currently, the front end

                                  transforms "? :" on floating-<br>

                                  point types into a select IR

                                  statement; for trapping FP<br>

                                  operations, an explicit branch must be

                                  used instead.)</font><br>

                                <br>

                                <br>

                                <font size="2">B. LLVM IR and LLVM

                                  common optimizations</font><br>

                                <br>

                                <font size="2">As mentioned in the

                                  previous section, we need some IR to

                                  annotate<br>

                                  FP instructions and call sites within

                                  FENV_ACCESS regions. All<br>

                                  common optimizations then need to

                                  respect the strict FP semantics<br>

                                  associated with those regions.</font><br>

                                <br>

                                <font size="2">The current approach uses

                                  experimental intrinsics. This has the<br>

                                  advantage that most optimizations

                                  never trigger since they don't<br>

                                  even recognize those new intrinsics.

                                  Also, the intrinsics can<br>

                                  be marked as having side-effects

                                  and/or being non-speculatable.</font><br>

                                <br>

                                <font size="2">The overall effect is

                                  that more optimizations are suppressed<br>

                                  than would be strictly necessary. But

                                  this may still be a good<br>

                                  first step, since the result is now

                                  safe but maybe not optimal<br>

                                  -- which can be improved upon over

                                  time by teaching the specific<br>

                                  semantics of those intrinsics to

                                  optimization passes.</font><br>

                                <br>

                                <font size="2">However, some open

                                  questions remain. If at some point we

                                  want<br>

                                  to model the constrained FP semantics

                                  more precisely than just<br>

                                  as "unmodeled side effects", this may

                                  have to be reflected at<br>

                                  the IR level directly. For example, to

                                  model rounding mode<br>

                                  behavior, at some point we might

                                  require explicit tracking of<br>

                                  data dependencies on the rounding mode

                                  by representing the<br>

                                  rounding mode as SSA values defined by

                                  function calls and used<br>

                                  by FP intrinsics. Similarly, to track

                                  exception status flags,<br>

                                  they might be modeled as SSA values

                                  set by FP intrinsics and<br>

                                  used by function calls.</font><br>

                                <br>

                                <font size="2">(There is a possibly

                                  related question of how to optimally

                                  model<br>

                                  the property of many math library

                                  routines that they may access<br>

                                  the "errno" variable but no other

                                  memory ... It might also be<br>

                                  possible to model e.g. exception

                                  status as a thread-local "memory"<br>

                                  location that is modified by FP

                                  operations, just like errno.)</font><br>

                                <br>

                                <font size="2">Another currently

                                  unresolved issue is that at the moment

                                  nothing<br>

                                  prevents *standard* floating-point

                                  operations from being moved<br>

                                  *inside* FENV_ACCESS regions. This may

                                  also be invalid, since<br>

                                  those operations now may cause

                                  unexpected traps etc. (More<br>

                                  specifically, what is invalid is

                                  moving any standard FP operation<br>

                                  across a *call site* within a

                                  FENV_ACCESS region.) Note that<br>

                                  this is even an issue if we only

                                  support changing the default<br>

                                  (and no actual #pragma) if mutiple

                                  object files using different<br>

                                  default settings are being linked

                                  together using LTO.</font><br>

                                <br>

                                <font size="2">This last issue could in

                                  theory be solved by having all

                                  optimization<br>

                                  passes respect the requirement that

                                  floating-point operations may<br>

                                  not be moved across call sites marked

                                  with the strict FP attribute.<br>

                                  But that does not appear to be

                                  straightforward since it would<br>

                                  introduce a "new" type of dependeny

                                  that would have to be added<br>

                                  throughout LLVM code. If this must be

                                  avoided, we'd have to<br>

                                  find a way to explicity track

                                  dependencies at the IR level. In<br>

                                  the extreme, this could end up

                                  equivalent to just always using<br>

                                  the constrained intrinsics for

                                  everything ...</font><br>

                                <br>

                                <br>

                                <font size="2">C. Code generation</font><br>

                                <br>

                                <font size="2">In the back end, effects

                                  of strict FP mode have to passed

                                  through<br>

                                  to lower-level representations

                                  including SelectionDAG and MI.</font><br>

                                <br>

                                <font size="2">Currently, the "unmodeled

                                  side effect" logic of the constrained<br>

                                  intrinsics is modeled by putting them

                                  on the chain during SelectionDAG.<br>

                                  (If we ever model semantics more

                                  precisely at the IR level, that<br>

                                  would need to be reflected on

                                  SelectionDAG accordingly.)</font><br>

                                <br>

                                <font size="2">At the MI level, there is

                                  no representation at all. One option

                                  to<br>

                                  fix this would be to model

                                  target-specific registers that

                                  implement<br>

                                  the IEEE semantics. Most platforms

                                  have registers (or parts of<br>

                                  registers) that hold:<br>

                                  - the current rounding mode<br>

                                  - the exception status flags<br>

                                  - the exception masks (which enable

                                  traps)<br>

                                  Marking FP instructions as using

                                  and/or defining these registers<br>

                                  would enforce ordering requirements.

                                  It may be too strict in some<br>

                                  cases (e.g. two instructions setting

                                  exception status flags may<br>

                                  still be reordered). On the other

                                  hand, I believe if instructions<br>

                                  may actually *trap*, we actually need

                                  the hasSideEffects flag even<br>

                                  if register dependencies are modeled.</font><br>

                                <br>

                                <font size="2">If we do need

                                  hasSideEffects, there is a separate

                                  discussion on<br>

                                  whether this can be implemented

                                  without each back end having to<br>

                                  duplicate all FP instruction patterns

                                  (one with hasSideEffects<br>

                                  and one without), e.g. by having a new

                                  feature that allows to<br>

                                  describe the side-effect status using

                                  an MI operand.</font><br>

                                <br>

                                <br>

                                <font size="2">Next steps<br>

                                  ==========</font><br>

                                <br>

                                <font size="2">I believe it is important

                                  to break up the full amount of work<br>

                                  into incremental steps that provide

                                  some useful benefits on their<br>

                                  own. At first, we should be able to

                                  get to a state where clang<br>

                                  can be used to build programs that use

                                  some (maybe not all) strict<br>

                                  FP features, where the generated code

                                  is always correct but may<br>

                                  not always be optimal. To get there, I

                                  think we need at a <br>

                                  minimum:</font><br>

                                <br>

                                <font size="2">- Implement clang support

                                  for the default flags, e.g. GCC's<br>

                                  -frounding-math and -ftrapping-math,

                                  and generate always<br>

                                  the constrained intrinsics. clang

                                  should also mark all<br>

                                  call sites then (as mentioned above).</font><br>

                                <br>

                                <font size="2">- For now, add the

                                  requirement that LTO is not supported

                                  if<br>

                                  this would cause mixing of strict and

                                  non-strict FP code.<br>

                                  In the alternative, have the LTO pass

                                  automatically transform<br>

                                  and floating-point operation into a

                                  constrained intrinsic<br>

                                  if *any* (other) module already uses

                                  the latter.</font><br>

                                <br>

                                <font size="2">- At the IR level,

                                  complete the set of supported

                                  constrained<br>

                                  FP intrinsics (there are still some

                                  missing, see e.g <br>

                                </font><font size="2"><a

                                    href="https://reviews.llvm.org/D43515"

                                    target="_blank"

                                    moz-do-not-send="true">https://reviews.llvm.org/D4351<wbr>5</a></font><font

                                  size="2">).<br>

                                  Also, it seems not all variants (e.g.

                                  for vector types) are<br>

                                  supported correctly through codegen

                                  (see e.g.<br>

                                </font><font size="2"><a

                                    href="https://reviews.llvm.org/D46967"

                                    target="_blank"

                                    moz-do-not-send="true">https://reviews.llvm.org/D4696<wbr>7</a></font><font

                                  size="2">).</font><br>

                                <br>

                                <font size="2">- Allow targets to

                                  correctly reflect constrained

                                  intrinsics<br>

                                  semantics at the MI level and final

                                  machine code generation<br>

                                  (see e.g. </font><font size="2"><a

                                    href="https://reviews.llvm.org/D45576"

                                    target="_blank"

                                    moz-do-not-send="true">https://reviews.llvm.org/D4557<wbr>6</a></font><font

                                  size="2">).</font><br>

                                <br>

                                <font size="2">- Review all optimization

                                  and codegen passes to verify they<br>

                                  fully respect strict FP semantics.</font><br>

                                <br>

                                <font size="2">Once this is done, we can

                                  improve on the solution by:</font><br>

                                <br>

                                <font size="2">- Supporting mixing

                                  strict and non-strict FP operations<br>

                                  (would lift the LTO restriction).

                                  (Note: there seems<br>

                                  to be still some "invention required"

                                  here, see above.)</font><br>

                                <br>

                                <font size="2">- Actually implementing

                                  the #pragma supporting different<br>

                                  regions within a compilation unit

                                  (prereq: support for<br>

                                  mixing strict and non-strict FP

                                  operations).</font><br>

                                <br>

                                <font size="2">- Add more optimization

                                  of constrained FP intrinsics in<br>

                                  common optimizers and/or target back

                                  ends.</font><br>

                                <br>

                                <font size="2">Does this look

                                  reasonable? Please let me know if

                                  there's<br>

                                  anything I overlooked, or you have any

                                  additional comments<br>

                                  or questions.</font><br>

                                <br>

                                <br>

                                <font size="2"><br>

                                  Mit freundlichen Gruessen / Best

                                  Regards<span

                                    class="gmail-m_-1433965244057454815HOEnZb"><font

                                      color="#888888"><br>

                                      <br>

                                      Ulrich Weigand<br>

                                      <br>

                                      -- <br>

                                      Dr. Ulrich Weigand | Phone:

                                      +49-7031/16-3727<br>

                                      STSM, GNU/Linux compilers and

                                      toolchain<br>

                                      IBM Deutschland Research &

                                      Development GmbH<br>

                                      Vorsitzende des Aufsichtsrats:

                                      Martina Koederitz |

                                      Geschäftsführung: Dirk Wittkopp<br>

                                      Sitz der Gesellschaft: Böblingen |

                                      Registergericht: Amtsgericht

                                      Stuttgart, HRB 243294</font></span></font><br>

                              </p>

                            </div>

                            <br>

                            ______________________________<wbr>_________________<br>

                            LLVM Developers mailing list<br>

                            <a href="mailto:llvm-dev@lists.llvm.org"

                              target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>

                            <a

                              href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"

                              rel="noreferrer" target="_blank"

                              moz-do-not-send="true">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

                            <br>

                          </blockquote>

                        </div>

                        <br>

                      </div>

                      <br>

                      <fieldset

                        class="gmail-m_-1433965244057454815mimeAttachmentHeader"></fieldset>

                      <br>

                      <pre>______________________________<wbr>_________________

LLVM Developers mailing list

<a class="gmail-m_-1433965244057454815moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org" target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a>

<a class="gmail-m_-1433965244057454815moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank" moz-do-not-send="true">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a>

</pre>

                    </blockquote>

                    <br>

                  </div>

                </div>

                <span class="gmail-HOEnZb"><font color="#888888">

                    <pre class="gmail-m_-1433965244057454815moz-signature" cols="72">-- 

Hal Finkel

Lead, Compiler Technology and Programming Languages

Leadership Computing Facility

Argonne National Laboratory</pre>

                  </font></span></div>

            </blockquote>

          </div>

          <br>

        </div>

      </div>

    </blockquote>

    <br>

    <pre class="moz-signature" cols="72">-- 

Hal Finkel

Lead, Compiler Technology and Programming Languages

Leadership Computing Facility

Argonne National Laboratory</pre>

  </body>

</html>