<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 05/23/2018 04:04 PM, Hubert Tong
      wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CACvkUqYmOjHVhJXMYYZ9A7CYUCJ_Qok+9GjtD8w+WeiJTA77ag@mail.gmail.com">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">On Wed, May 23, 2018 at 12:19 PM, Hal
            Finkel <span dir="ltr"><<a href="mailto:hfinkel@anl.gov"
                target="_blank" moz-do-not-send="true">hfinkel@anl.gov</a>></span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div bgcolor="#FFFFFF"><span class="gmail-">
                  <p><br>
                  </p>
                  <div
                    class="gmail-m_-1433965244057454815moz-cite-prefix">On
                    05/23/2018 11:06 AM, Hubert Tong via llvm-dev wrote:<br>
                  </div>
                  <blockquote type="cite">
                    <div dir="ltr">
                      <div>Hi Ulrich,<br>
                        <br>
                      </div>
                      <div>I am interested in knowing if the current
                        proposals also take into account the FP_CONTRACT
                        pragma</div>
                    </div>
                  </blockquote>
                  <br>
                </span> We should already do this (we turn relevant
                operations into the @llvm.fmuladd. when FP_CONTRACT is
                set to on during IR generation).<span class="gmail-"><br>
                </span></div>
            </blockquote>
            <div>I am not sure we have the same interpretation of what
              the FP_CONTRACT pragma does. Subclause 6.5 paragraph 8 of
              C11 implies (for example) that even where the FENV_ACCESS
              pragma is "on", folding a constant subexpression with an
              exactly representable result on an implementation where
              FLT_EVAL_METHOD is 0 is within the range of acceptable
              implementation-defined behaviour despite intermediate
              overflow under non-contracted evaluation. Which is to say
              that the current proposal reads as what needs to be done
              when FP_CONTRACT is "off" and FENV_ACCESS is "on". The
              note from Ulrich implies that the requirements are imposed
              by the Standard, but the range of implementation defined
              behaviour where FP_CONTRACT is "on" where FENV_ACCESS is
              also "on" is possibly a discussion to be had.<br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    Thanks for explaining. Yes, I agree, this is certainly worth
    discussing. Do you have thoughts on what we should do? I think it
    makes sense to fold where possible, as the user has requested the
    extra intermediate precision available from FMA formation.<br>
    <br>
    Also, to what extent can we change our minds later? For example,
    with C++/constexpr, etc. does this have ABI implications?<br>
    <br>
    <blockquote type="cite"
cite="mid:CACvkUqYmOjHVhJXMYYZ9A7CYUCJ_Qok+9GjtD8w+WeiJTA77ag@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div bgcolor="#FFFFFF"><span class="gmail-"> <br>
                  <blockquote type="cite">
                    <div dir="ltr">
                      <div> and the ability to implement options that
                        imply a specific value for the FLT_EVAL_METHOD
                        macro.<br>
                      </div>
                    </div>
                  </blockquote>
                  <br>
                </span> What do you mean by this?<br>
              </div>
            </blockquote>
            <div>I admit that modes where FLT_EVAL_METHOD, respectively,
              is 0 (no extra range and precision), 1 (float in double
              range and precision), and 2 (float and double in long
              double range and precision) are all straightforward for
              the IR producer to implement by fixing the types used in
              the IR emitted (implying the value FLT_EVAL_METHOD is not
              constant within a program).<br>
              <br>
              So, this is more about implementing meaningful cases of
              FLT_EVAL_METHOD being -1. My point below (in my previous
              note) is that allowing IR passes or the back-end to choose
              the range and precision in a manner conforming to Standard
              C (for a FLT_EVAL_METHOD of -1)--perhaps for speed where
              multiple sets of floating-point operations/registers are
              available with differing "preferred types"--appears to be
              a use case that the IR does not seem to support well.</div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    Yes. In the LangRef we do have fpmath metadata
    (<a class="moz-txt-link-freetext" href="http://llvm.org/docs/LangRef.html#fpmath-metadata">http://llvm.org/docs/LangRef.html#fpmath-metadata</a>), which might be
    useful in this space, but I don't think we actually use it for
    anything.<br>
    <br>
    <blockquote type="cite"
cite="mid:CACvkUqYmOjHVhJXMYYZ9A7CYUCJ_Qok+9GjtD8w+WeiJTA77ag@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div> As for why a FLT_EVAL_METHOD of -1 is on-topic for
              this thread: The language semantics allow the case of the
              constant subexpression folding I mentioned above even when
              FP_CONTRACT is "off" and FENV_ACCESS is "on", because the
              evaluation format used for the evaluation of that
              subexpression can be said to have infinite range and
              precision.<br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    An, interesting. FLT_EVAL_METHOD is a constant chosen (globally) by
    the implementation, correct? Do you know of platforms that set
    FLT_EVAL_METHOD to -1?<br>
    <br>
     -Hal<br>
    <br>
    <blockquote type="cite"
cite="mid:CACvkUqYmOjHVhJXMYYZ9A7CYUCJ_Qok+9GjtD8w+WeiJTA77ag@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div bgcolor="#FFFFFF"> <br>
                 -Hal
                <div>
                  <div class="gmail-h5"><br>
                    <br>
                    <blockquote type="cite">
                      <div dir="ltr">
                        <div><br>
                        </div>
                        <div>Additionally, I am not aware of the IR
                          being able to represent the potentially
                          deferred loss of precision that the C language
                          semantics provide; in particular, applying
                          such semantics to the existing IR would hit an
                          issue that the limits of such deferment would
                          need an agreed representation.<br>
                          <br>
                        </div>
                        <div>As for the mixing of strict and non-strict
                          modes, I would be interested in where LLVM is
                          in its handling of non-SSA (pseudo-memory?)
                          dependencies. I have a vague impression that
                          it is very coarse-grained in that respect, but
                          I admit to not being particularly informed in
                          that space. If there is a good model for such
                          dependencies, then I think it could be used to
                          handle the strict/non-strict mixing.<br>
                        </div>
                        <div><br>
                        </div>
                        -- Hubert Tong, IBM<br>
                        <br>
                        <div>PS A nitpick on wording: The idea of being
                          inside or outside of FENV_ACCESS regions is
                          instead be expressed in terms of the state of
                          the FENV_ACCESS pragma within the C Standard.<br>
                        </div>
                      </div>
                      <div class="gmail_extra"><br>
                        <div class="gmail_quote">On Wed, May 23, 2018 at
                          10:48 AM, Ulrich Weigand via llvm-dev <span
                            dir="ltr"><<a
                              href="mailto:llvm-dev@lists.llvm.org"
                              target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a>></span>
                          wrote:<br>
                          <blockquote class="gmail_quote"
                            style="margin:0px 0px 0px
                            0.8ex;border-left:1px solid
                            rgb(204,204,204);padding-left:1ex">
                            <div>
                              <p><font size="2">Hello,</font><br>
                                <br>
                                <font size="2">at the recent EuroLLVM
                                  developer meeting in Bristol I held a
                                  BoF<br>
                                  session on the topic "Towards
                                  implementing #pragma STDC
                                  FENV_ACCESS".<br>
                                  I've also had a number of follow-on
                                  discussions both on-site in<br>
                                  Bristol and online since. This post is
                                  intended as a summary of<br>
                                  my current understanding set of
                                  requirements and implementation<br>
                                  details covering the overall topic.</font><br>
                                <br>
                                <font size="2">I'm posting this here in
                                  the hope this can serve as a basis for<br>
                                  the various more detailed discussions
                                  that are still ongoing<br>
                                  (e.g. in various Phabricator proposals
                                  right now). Any comments<br>
                                  are welcome!</font><br>
                                <br>
                                <br>
                                <font size="2">Semantics of #pragma STDC
                                  FENV_ACCESS<br>
                                  ==============================<wbr>=======</font><br>
                                <br>
                                <font size="2">To provide a baseline for
                                  the implementation discussion, first
                                  an<br>
                                  overview of the features required to
                                  handle the strict floating-point<br>
                                  mode defined by the C and IEEE
                                  standard:</font><br>
                                <br>
                                <font size="2">1. Floating-point
                                  rounding modes<br>
                                  2. Default floating-point exception
                                  handling<br>
                                  3. Trapping floating-point exception
                                  handling</font><br>
                                <br>
                                <font size="2">Each of these separate
                                  features imposes different constraints
                                  on the<br>
                                  optimizations that LLVM may perform
                                  involving FP expressions:</font><br>
                                <br>
                                <font size="2">1. Floating-point
                                  rounding modes</font><br>
                                <br>
                                <font size="2">Outside of FENV_ACCESS
                                  regions, all FP operations are
                                  supposed to be<br>
                                  performed in the "default" rounding
                                  mode.</font><br>
                                <br>
                                <font size="2">But inside FENV_ACCESS
                                  regions, FP operations implicitly
                                  depend on<br>
                                  a "current" rounding mode setting,
                                  which may be changed by certain<br>
                                  C library calls (plus some
                                  platform-specific intrinsics). In
                                  addition,<br>
                                  those calls may be performed within
                                  subroutines (as long as those are<br>
                                  also within FENV_ACCESS), so *any*
                                  function call within a FENV_ACCESS<br>
                                  must be considered as potentially
                                  changing the rounding mode.</font><br>
                                <br>
                                <font size="2">In effect, this means the
                                  compiler may not move or combine FP<br>
                                  operations accross function call
                                  sites.</font><br>
                                <br>
                                <font size="2">2. Default floating-point
                                  exception handling</font><br>
                                <br>
                                <font size="2">Inside FENV_ACCESS
                                  regions, every floating-point
                                  operation that<br>
                                  causes an exception must be considered
                                  to set a "status flag"<br>
                                  associated with this exception type.
                                  Those flags can be queried<br>
                                  using C library calls (plus some
                                  platform-specific intrinsics),<br>
                                  and there are other such calls to
                                  explicitly set or clear those<br>
                                  flags as well. As with the rounding
                                  modes, those calls may be<br>
                                  performed in subroutines as well, so
                                  any function call within a<br>
                                  FENV_ACCESS region must be considered
                                  as potentially *using* and<br>
                                  changing the floating-point exception
                                  status flags.</font><br>
                                <br>
                                <font size="2">The values of the status
                                  flags on entry to a FENV_ACCESS are to<br>
                                  be considered undefined according to
                                  the C standard.</font><br>
                                <br>
                                <font size="2">Compiler optimizations
                                  are supposed to preserve the values of<br>
                                  all exception status bits at any point
                                  where they can be<br>
                                  (potentially) inspected by the
                                  program, i.e. at all call sites<br>
                                  within FENV_ACCESS regions. This still
                                  allows a number of<br>
                                  optimizations, e.g. to reorder FP
                                  operations or combine two<br>
                                  identical operations within a region
                                  uninterrupted by calls.<br>
                                  But other optimizations should be
                                  avoided, e.g. optimizing<br>
                                  away an unused FP operation may result
                                  in an exception flag<br>
                                  now being unset that would otherwise
                                  have been set. The same<br>
                                  applies to floating-point constant
                                  folding.</font><br>
                                <br>
                                <font size="2">3. Trapping
                                  floating-point exception handling</font><br>
                                <br>
                                <font size="2">Within a FENV_ACCESS
                                  region, library calls may be used to
                                  switch<br>
                                  exception handling semantics to a
                                  "trapping" mode by setting<br>
                                  corresponding mask bits. Any
                                  subsequent FP instruction that<br>
                                  raises an exception with the
                                  associated mask bit set will cause<br>
                                  a trap. Usually, this will be a
                                  hardware trap that is translated<br>
                                  by the operating system into some form
                                  of software exception that<br>
                                  can by handled by the applcation; on
                                  Linux systems this takes the<br>
                                  form of a SIGFPE signal.</font><br>
                                <br>
                                <font size="2">As above, those mask bits
                                  can be set and reset via (operating-<br>
                                  system specific) library calls and/or
                                  platform-specific intrinsics,<br>
                                  all of which may also be done within
                                  subroutine calls.</font><br>
                                <br>
                                <font size="2">In effect, this requires
                                  the compiler to treat any
                                  floating-point<br>
                                  operation within a FENV_ACCESS region
                                  as potentially trapping,<br>
                                  which means the same restrictions
                                  apply as with e.g. memory accesses<br>
                                  (cannot be speculated etc.) However,
                                  according to the C standard,<br>
                                  the implementation is not required to
                                  preserve the *number* of<br>
                                  different traps, so identical
                                  operations may still be combined<br>
                                  (unless there is an intervening
                                  function call).</font><br>
                                <br>
                                <font size="2">The C standard requires
                                  all user code to explicitly switch
                                  back<br>
                                  to non-trapping mode for all
                                  exceptions whenever leaving a<br>
                                  FENV_ACCESS region (both by "falling
                                  off the end" of the region<br>
                                  and by calling a subroutine defined
                                  outside of FENV_ACCESS).</font><br>
                                <br>
                                <br>
                                <font size="2">Implementation
                                  requirements on parts of the compiler<br>
                                  ==============================<wbr>======================</font><br>
                                <br>
                                <font size="2">A. clang front end</font><br>
                                <br>
                                <font size="2">The front end needs to
                                  determine which instructions are part
                                  of<br>
                                  FENV_ACCESS regions and which are not.
                                  This takes into account<br>
                                  both the semantics of the #pragma as
                                  defined by the standard,<br>
                                  and the implementation-defined default
                                  rules that apply to code<br>
                                  outside of any #pragma. GCC currently
                                  has the following two<br>
                                  related command-line options:</font><br>
                                <br>
                                <font size="2">-frounding-math: Do not
                                  assume default rounding mode<br>
                                  -ftrapping-math: Assume FP operations
                                  may trap</font><br>
                                <br>
                                <font size="2">clang accepts but
                                  (basically) ignores those options. As
                                  a first<br>
                                  step, it might make sense to have the
                                  FENV_ACCESS default</font><br>
                                <font size="2">behavior triggered by
                                  these options, even while the front
                                  end<br>
                                  does not yet support the actual
                                  #pragma.</font><br>
                                <br>
                                <font size="2">The front end then needs
                                  to transmit the information about<br>
                                  FENV_ACCESS regions to later passes.
                                  However, I believe that<br>
                                  we do not actually have to implement
                                  "regions" as such at the<br>
                                  IR level. Instead, it would be
                                  sufficient to track the follwing<br>
                                  information:</font><br>
                                <br>
                                <font size="2">- For each FP operation,
                                  whether it is within a FENV_ACCESS
                                  region.<br>
                                  - For each call site, whether it is
                                  within a FENV_ACCESS region.</font><br>
                                <br>
                                <font size="2">The former requires new
                                  IR support; the approach currently
                                  under<br>
                                  investigation uses the experimental
                                  "constrained FP" intrinsics<br>
                                  instead of traditional floating-point
                                  operations for this. The<br>
                                  latter can be done simply by
                                  annotating those call sites with an<br>
                                  attribute.</font><br>
                                <br>
                                <font size="2">In addition to that, the
                                  front-end itself needs to disable any<br>
                                  early optimizations that do not
                                  preserve strict FP semantics,<br>
                                  in particular it must not speculate FP
                                  operations if they may<br>
                                  trap. (Currently, the front end
                                  transforms "? :" on floating-<br>
                                  point types into a select IR
                                  statement; for trapping FP<br>
                                  operations, an explicit branch must be
                                  used instead.)</font><br>
                                <br>
                                <br>
                                <font size="2">B. LLVM IR and LLVM
                                  common optimizations</font><br>
                                <br>
                                <font size="2">As mentioned in the
                                  previous section, we need some IR to
                                  annotate<br>
                                  FP instructions and call sites within
                                  FENV_ACCESS regions. All<br>
                                  common optimizations then need to
                                  respect the strict FP semantics<br>
                                  associated with those regions.</font><br>
                                <br>
                                <font size="2">The current approach uses
                                  experimental intrinsics. This has the<br>
                                  advantage that most optimizations
                                  never trigger since they don't<br>
                                  even recognize those new intrinsics.
                                  Also, the intrinsics can<br>
                                  be marked as having side-effects
                                  and/or being non-speculatable.</font><br>
                                <br>
                                <font size="2">The overall effect is
                                  that more optimizations are suppressed<br>
                                  than would be strictly necessary. But
                                  this may still be a good<br>
                                  first step, since the result is now
                                  safe but maybe not optimal<br>
                                  -- which can be improved upon over
                                  time by teaching the specific<br>
                                  semantics of those intrinsics to
                                  optimization passes.</font><br>
                                <br>
                                <font size="2">However, some open
                                  questions remain. If at some point we
                                  want<br>
                                  to model the constrained FP semantics
                                  more precisely than just<br>
                                  as "unmodeled side effects", this may
                                  have to be reflected at<br>
                                  the IR level directly. For example, to
                                  model rounding mode<br>
                                  behavior, at some point we might
                                  require explicit tracking of<br>
                                  data dependencies on the rounding mode
                                  by representing the<br>
                                  rounding mode as SSA values defined by
                                  function calls and used<br>
                                  by FP intrinsics. Similarly, to track
                                  exception status flags,<br>
                                  they might be modeled as SSA values
                                  set by FP intrinsics and<br>
                                  used by function calls.</font><br>
                                <br>
                                <font size="2">(There is a possibly
                                  related question of how to optimally
                                  model<br>
                                  the property of many math library
                                  routines that they may access<br>
                                  the "errno" variable but no other
                                  memory ... It might also be<br>
                                  possible to model e.g. exception
                                  status as a thread-local "memory"<br>
                                  location that is modified by FP
                                  operations, just like errno.)</font><br>
                                <br>
                                <font size="2">Another currently
                                  unresolved issue is that at the moment
                                  nothing<br>
                                  prevents *standard* floating-point
                                  operations from being moved<br>
                                  *inside* FENV_ACCESS regions. This may
                                  also be invalid, since<br>
                                  those operations now may cause
                                  unexpected traps etc. (More<br>
                                  specifically, what is invalid is
                                  moving any standard FP operation<br>
                                  across a *call site* within a
                                  FENV_ACCESS region.) Note that<br>
                                  this is even an issue if we only
                                  support changing the default<br>
                                  (and no actual #pragma) if mutiple
                                  object files using different<br>
                                  default settings are being linked
                                  together using LTO.</font><br>
                                <br>
                                <font size="2">This last issue could in
                                  theory be solved by having all
                                  optimization<br>
                                  passes respect the requirement that
                                  floating-point operations may<br>
                                  not be moved across call sites marked
                                  with the strict FP attribute.<br>
                                  But that does not appear to be
                                  straightforward since it would<br>
                                  introduce a "new" type of dependeny
                                  that would have to be added<br>
                                  throughout LLVM code. If this must be
                                  avoided, we'd have to<br>
                                  find a way to explicity track
                                  dependencies at the IR level. In<br>
                                  the extreme, this could end up
                                  equivalent to just always using<br>
                                  the constrained intrinsics for
                                  everything ...</font><br>
                                <br>
                                <br>
                                <font size="2">C. Code generation</font><br>
                                <br>
                                <font size="2">In the back end, effects
                                  of strict FP mode have to passed
                                  through<br>
                                  to lower-level representations
                                  including SelectionDAG and MI.</font><br>
                                <br>
                                <font size="2">Currently, the "unmodeled
                                  side effect" logic of the constrained<br>
                                  intrinsics is modeled by putting them
                                  on the chain during SelectionDAG.<br>
                                  (If we ever model semantics more
                                  precisely at the IR level, that<br>
                                  would need to be reflected on
                                  SelectionDAG accordingly.)</font><br>
                                <br>
                                <font size="2">At the MI level, there is
                                  no representation at all. One option
                                  to<br>
                                  fix this would be to model
                                  target-specific registers that
                                  implement<br>
                                  the IEEE semantics. Most platforms
                                  have registers (or parts of<br>
                                  registers) that hold:<br>
                                  - the current rounding mode<br>
                                  - the exception status flags<br>
                                  - the exception masks (which enable
                                  traps)<br>
                                  Marking FP instructions as using
                                  and/or defining these registers<br>
                                  would enforce ordering requirements.
                                  It may be too strict in some<br>
                                  cases (e.g. two instructions setting
                                  exception status flags may<br>
                                  still be reordered). On the other
                                  hand, I believe if instructions<br>
                                  may actually *trap*, we actually need
                                  the hasSideEffects flag even<br>
                                  if register dependencies are modeled.</font><br>
                                <br>
                                <font size="2">If we do need
                                  hasSideEffects, there is a separate
                                  discussion on<br>
                                  whether this can be implemented
                                  without each back end having to<br>
                                  duplicate all FP instruction patterns
                                  (one with hasSideEffects<br>
                                  and one without), e.g. by having a new
                                  feature that allows to<br>
                                  describe the side-effect status using
                                  an MI operand.</font><br>
                                <br>
                                <br>
                                <font size="2">Next steps<br>
                                  ==========</font><br>
                                <br>
                                <font size="2">I believe it is important
                                  to break up the full amount of work<br>
                                  into incremental steps that provide
                                  some useful benefits on their<br>
                                  own. At first, we should be able to
                                  get to a state where clang<br>
                                  can be used to build programs that use
                                  some (maybe not all) strict<br>
                                  FP features, where the generated code
                                  is always correct but may<br>
                                  not always be optimal. To get there, I
                                  think we need at a <br>
                                  minimum:</font><br>
                                <br>
                                <font size="2">- Implement clang support
                                  for the default flags, e.g. GCC's<br>
                                  -frounding-math and -ftrapping-math,
                                  and generate always<br>
                                  the constrained intrinsics. clang
                                  should also mark all<br>
                                  call sites then (as mentioned above).</font><br>
                                <br>
                                <font size="2">- For now, add the
                                  requirement that LTO is not supported
                                  if<br>
                                  this would cause mixing of strict and
                                  non-strict FP code.<br>
                                  In the alternative, have the LTO pass
                                  automatically transform<br>
                                  and floating-point operation into a
                                  constrained intrinsic<br>
                                  if *any* (other) module already uses
                                  the latter.</font><br>
                                <br>
                                <font size="2">- At the IR level,
                                  complete the set of supported
                                  constrained<br>
                                  FP intrinsics (there are still some
                                  missing, see e.g <br>
                                </font><font size="2"><a
                                    href="https://reviews.llvm.org/D43515"
                                    target="_blank"
                                    moz-do-not-send="true">https://reviews.llvm.org/D4351<wbr>5</a></font><font
                                  size="2">).<br>
                                  Also, it seems not all variants (e.g.
                                  for vector types) are<br>
                                  supported correctly through codegen
                                  (see e.g.<br>
                                </font><font size="2"><a
                                    href="https://reviews.llvm.org/D46967"
                                    target="_blank"
                                    moz-do-not-send="true">https://reviews.llvm.org/D4696<wbr>7</a></font><font
                                  size="2">).</font><br>
                                <br>
                                <font size="2">- Allow targets to
                                  correctly reflect constrained
                                  intrinsics<br>
                                  semantics at the MI level and final
                                  machine code generation<br>
                                  (see e.g. </font><font size="2"><a
                                    href="https://reviews.llvm.org/D45576"
                                    target="_blank"
                                    moz-do-not-send="true">https://reviews.llvm.org/D4557<wbr>6</a></font><font
                                  size="2">).</font><br>
                                <br>
                                <font size="2">- Review all optimization
                                  and codegen passes to verify they<br>
                                  fully respect strict FP semantics.</font><br>
                                <br>
                                <font size="2">Once this is done, we can
                                  improve on the solution by:</font><br>
                                <br>
                                <font size="2">- Supporting mixing
                                  strict and non-strict FP operations<br>
                                  (would lift the LTO restriction).
                                  (Note: there seems<br>
                                  to be still some "invention required"
                                  here, see above.)</font><br>
                                <br>
                                <font size="2">- Actually implementing
                                  the #pragma supporting different<br>
                                  regions within a compilation unit
                                  (prereq: support for<br>
                                  mixing strict and non-strict FP
                                  operations).</font><br>
                                <br>
                                <font size="2">- Add more optimization
                                  of constrained FP intrinsics in<br>
                                  common optimizers and/or target back
                                  ends.</font><br>
                                <br>
                                <font size="2">Does this look
                                  reasonable? Please let me know if
                                  there's<br>
                                  anything I overlooked, or you have any
                                  additional comments<br>
                                  or questions.</font><br>
                                <br>
                                <br>
                                <font size="2"><br>
                                  Mit freundlichen Gruessen / Best
                                  Regards<span
                                    class="gmail-m_-1433965244057454815HOEnZb"><font
                                      color="#888888"><br>
                                      <br>
                                      Ulrich Weigand<br>
                                      <br>
                                      -- <br>
                                      Dr. Ulrich Weigand | Phone:
                                      +49-7031/16-3727<br>
                                      STSM, GNU/Linux compilers and
                                      toolchain<br>
                                      IBM Deutschland Research &
                                      Development GmbH<br>
                                      Vorsitzende des Aufsichtsrats:
                                      Martina Koederitz |
                                      Geschäftsführung: Dirk Wittkopp<br>
                                      Sitz der Gesellschaft: Böblingen |
                                      Registergericht: Amtsgericht
                                      Stuttgart, HRB 243294</font></span></font><br>
                              </p>
                            </div>
                            <br>
                            ______________________________<wbr>_________________<br>
                            LLVM Developers mailing list<br>
                            <a href="mailto:llvm-dev@lists.llvm.org"
                              target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>
                            <a
                              href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"
                              rel="noreferrer" target="_blank"
                              moz-do-not-send="true">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
                            <br>
                          </blockquote>
                        </div>
                        <br>
                      </div>
                      <br>
                      <fieldset
                        class="gmail-m_-1433965244057454815mimeAttachmentHeader"></fieldset>
                      <br>
                      <pre>______________________________<wbr>_________________
LLVM Developers mailing list
<a class="gmail-m_-1433965244057454815moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org" target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a>
<a class="gmail-m_-1433965244057454815moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank" moz-do-not-send="true">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a>
</pre>
                    </blockquote>
                    <br>
                  </div>
                </div>
                <span class="gmail-HOEnZb"><font color="#888888">
                    <pre class="gmail-m_-1433965244057454815moz-signature" cols="72">-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
                  </font></span></div>
            </blockquote>
          </div>
          <br>
        </div>
      </div>
    </blockquote>
    <br>
    <pre class="moz-signature" cols="72">-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
  </body>
</html>