<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p><br>

    </p>

    <div class="moz-cite-prefix">On 05/23/2018 11:06 AM, Hubert Tong via

      llvm-dev wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CACvkUqY+w4Yf5Zyt81DGww4FYD7qLL9WUd_Nx-=r5Lo_z0K30A@mail.gmail.com">

      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

      <div dir="ltr">

        <div>Hi Ulrich,<br>

          <br>

        </div>

        <div>I am interested in knowing if the current proposals also

          take into account the FP_CONTRACT pragma</div>

      </div>

    </blockquote>

    <br>

    We should already do this (we turn relevant operations into the

    @llvm.fmuladd. when FP_CONTRACT is set to on during IR generation).<br>

    <br>

    <blockquote type="cite"

cite="mid:CACvkUqY+w4Yf5Zyt81DGww4FYD7qLL9WUd_Nx-=r5Lo_z0K30A@mail.gmail.com">

      <div dir="ltr">

        <div> and the ability to implement options that imply a specific

          value for the FLT_EVAL_METHOD macro.<br>

        </div>

      </div>

    </blockquote>

    <br>

    What do you mean by this?<br>

    <br>

     -Hal<br>

    <br>

    <blockquote type="cite"

cite="mid:CACvkUqY+w4Yf5Zyt81DGww4FYD7qLL9WUd_Nx-=r5Lo_z0K30A@mail.gmail.com">

      <div dir="ltr">

        <div><br>

        </div>

        <div>Additionally, I am not aware of the IR being able to

          represent the potentially deferred loss of precision that the

          C language semantics provide; in particular, applying such

          semantics to the existing IR would hit an issue that the

          limits of such deferment would need an agreed representation.<br>

          <br>

        </div>

        <div>As for the mixing of strict and non-strict modes, I would

          be interested in where LLVM is in its handling of non-SSA

          (pseudo-memory?) dependencies. I have a vague impression that

          it is very coarse-grained in that respect, but I admit to not

          being particularly informed in that space. If there is a good

          model for such dependencies, then I think it could be used to

          handle the strict/non-strict mixing.<br>

        </div>

        <div><br>

        </div>

        -- Hubert Tong, IBM<br>

        <br>

        <div>PS A nitpick on wording: The idea of being inside or

          outside of FENV_ACCESS regions is instead be expressed in

          terms of the state of the FENV_ACCESS pragma within the C

          Standard.<br>

        </div>

      </div>

      <div class="gmail_extra"><br>

        <div class="gmail_quote">On Wed, May 23, 2018 at 10:48 AM,

          Ulrich Weigand via llvm-dev <span dir="ltr"><<a

              href="mailto:llvm-dev@lists.llvm.org" target="_blank"

              moz-do-not-send="true">llvm-dev@lists.llvm.org</a>></span>

          wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div>

              <p><font size="2">Hello,</font><br>

                <br>

                <font size="2">at the recent EuroLLVM developer meeting

                  in Bristol I held a BoF<br>

                  session on the topic "Towards implementing #pragma

                  STDC FENV_ACCESS".<br>

                  I've also had a number of follow-on discussions both

                  on-site in<br>

                  Bristol and online since. This post is intended as a

                  summary of<br>

                  my current understanding set of requirements and

                  implementation<br>

                  details covering the overall topic.</font><br>

                <br>

                <font size="2">I'm posting this here in the hope this

                  can serve as a basis for<br>

                  the various more detailed discussions that are still

                  ongoing<br>

                  (e.g. in various Phabricator proposals right now). Any

                  comments<br>

                  are welcome!</font><br>

                <br>

                <br>

                <font size="2">Semantics of #pragma STDC FENV_ACCESS<br>

                  ==============================<wbr>=======</font><br>

                <br>

                <font size="2">To provide a baseline for the

                  implementation discussion, first an<br>

                  overview of the features required to handle the strict

                  floating-point<br>

                  mode defined by the C and IEEE standard:</font><br>

                <br>

                <font size="2">1. Floating-point rounding modes<br>

                  2. Default floating-point exception handling<br>

                  3. Trapping floating-point exception handling</font><br>

                <br>

                <font size="2">Each of these separate features imposes

                  different constraints on the<br>

                  optimizations that LLVM may perform involving FP

                  expressions:</font><br>

                <br>

                <font size="2">1. Floating-point rounding modes</font><br>

                <br>

                <font size="2">Outside of FENV_ACCESS regions, all FP

                  operations are supposed to be<br>

                  performed in the "default" rounding mode.</font><br>

                <br>

                <font size="2">But inside FENV_ACCESS regions, FP

                  operations implicitly depend on<br>

                  a "current" rounding mode setting, which may be

                  changed by certain<br>

                  C library calls (plus some platform-specific

                  intrinsics). In addition,<br>

                  those calls may be performed within subroutines (as

                  long as those are<br>

                  also within FENV_ACCESS), so *any* function call

                  within a FENV_ACCESS<br>

                  must be considered as potentially changing the

                  rounding mode.</font><br>

                <br>

                <font size="2">In effect, this means the compiler may

                  not move or combine FP<br>

                  operations accross function call sites.</font><br>

                <br>

                <font size="2">2. Default floating-point exception

                  handling</font><br>

                <br>

                <font size="2">Inside FENV_ACCESS regions, every

                  floating-point operation that<br>

                  causes an exception must be considered to set a

                  "status flag"<br>

                  associated with this exception type. Those flags can

                  be queried<br>

                  using C library calls (plus some platform-specific

                  intrinsics),<br>

                  and there are other such calls to explicitly set or

                  clear those<br>

                  flags as well. As with the rounding modes, those calls

                  may be<br>

                  performed in subroutines as well, so any function call

                  within a<br>

                  FENV_ACCESS region must be considered as potentially

                  *using* and<br>

                  changing the floating-point exception status flags.</font><br>

                <br>

                <font size="2">The values of the status flags on entry

                  to a FENV_ACCESS are to<br>

                  be considered undefined according to the C standard.</font><br>

                <br>

                <font size="2">Compiler optimizations are supposed to

                  preserve the values of<br>

                  all exception status bits at any point where they can

                  be<br>

                  (potentially) inspected by the program, i.e. at all

                  call sites<br>

                  within FENV_ACCESS regions. This still allows a number

                  of<br>

                  optimizations, e.g. to reorder FP operations or

                  combine two<br>

                  identical operations within a region uninterrupted by

                  calls.<br>

                  But other optimizations should be avoided, e.g.

                  optimizing<br>

                  away an unused FP operation may result in an exception

                  flag<br>

                  now being unset that would otherwise have been set.

                  The same<br>

                  applies to floating-point constant folding.</font><br>

                <br>

                <font size="2">3. Trapping floating-point exception

                  handling</font><br>

                <br>

                <font size="2">Within a FENV_ACCESS region, library

                  calls may be used to switch<br>

                  exception handling semantics to a "trapping" mode by

                  setting<br>

                  corresponding mask bits. Any subsequent FP instruction

                  that<br>

                  raises an exception with the associated mask bit set

                  will cause<br>

                  a trap. Usually, this will be a hardware trap that is

                  translated<br>

                  by the operating system into some form of software

                  exception that<br>

                  can by handled by the applcation; on Linux systems

                  this takes the<br>

                  form of a SIGFPE signal.</font><br>

                <br>

                <font size="2">As above, those mask bits can be set and

                  reset via (operating-<br>

                  system specific) library calls and/or

                  platform-specific intrinsics,<br>

                  all of which may also be done within subroutine calls.</font><br>

                <br>

                <font size="2">In effect, this requires the compiler to

                  treat any floating-point<br>

                  operation within a FENV_ACCESS region as potentially

                  trapping,<br>

                  which means the same restrictions apply as with e.g.

                  memory accesses<br>

                  (cannot be speculated etc.) However, according to the

                  C standard,<br>

                  the implementation is not required to preserve the

                  *number* of<br>

                  different traps, so identical operations may still be

                  combined<br>

                  (unless there is an intervening function call).</font><br>

                <br>

                <font size="2">The C standard requires all user code to

                  explicitly switch back<br>

                  to non-trapping mode for all exceptions whenever

                  leaving a<br>

                  FENV_ACCESS region (both by "falling off the end" of

                  the region<br>

                  and by calling a subroutine defined outside of

                  FENV_ACCESS).</font><br>

                <br>

                <br>

                <font size="2">Implementation requirements on parts of

                  the compiler<br>

                  ==============================<wbr>======================</font><br>

                <br>

                <font size="2">A. clang front end</font><br>

                <br>

                <font size="2">The front end needs to determine which

                  instructions are part of<br>

                  FENV_ACCESS regions and which are not. This takes into

                  account<br>

                  both the semantics of the #pragma as defined by the

                  standard,<br>

                  and the implementation-defined default rules that

                  apply to code<br>

                  outside of any #pragma. GCC currently has the

                  following two<br>

                  related command-line options:</font><br>

                <br>

                <font size="2">-frounding-math: Do not assume default

                  rounding mode<br>

                  -ftrapping-math: Assume FP operations may trap</font><br>

                <br>

                <font size="2">clang accepts but (basically) ignores

                  those options. As a first<br>

                  step, it might make sense to have the FENV_ACCESS

                  default</font><br>

                <font size="2">behavior triggered by these options, even

                  while the front end<br>

                  does not yet support the actual #pragma.</font><br>

                <br>

                <font size="2">The front end then needs to transmit the

                  information about<br>

                  FENV_ACCESS regions to later passes. However, I

                  believe that<br>

                  we do not actually have to implement "regions" as such

                  at the<br>

                  IR level. Instead, it would be sufficient to track the

                  follwing<br>

                  information:</font><br>

                <br>

                <font size="2">- For each FP operation, whether it is

                  within a FENV_ACCESS region.<br>

                  - For each call site, whether it is within a

                  FENV_ACCESS region.</font><br>

                <br>

                <font size="2">The former requires new IR support; the

                  approach currently under<br>

                  investigation uses the experimental "constrained FP"

                  intrinsics<br>

                  instead of traditional floating-point operations for

                  this. The<br>

                  latter can be done simply by annotating those call

                  sites with an<br>

                  attribute.</font><br>

                <br>

                <font size="2">In addition to that, the front-end itself

                  needs to disable any<br>

                  early optimizations that do not preserve strict FP

                  semantics,<br>

                  in particular it must not speculate FP operations if

                  they may<br>

                  trap. (Currently, the front end transforms "? :" on

                  floating-<br>

                  point types into a select IR statement; for trapping

                  FP<br>

                  operations, an explicit branch must be used instead.)</font><br>

                <br>

                <br>

                <font size="2">B. LLVM IR and LLVM common optimizations</font><br>

                <br>

                <font size="2">As mentioned in the previous section, we

                  need some IR to annotate<br>

                  FP instructions and call sites within FENV_ACCESS

                  regions. All<br>

                  common optimizations then need to respect the strict

                  FP semantics<br>

                  associated with those regions.</font><br>

                <br>

                <font size="2">The current approach uses experimental

                  intrinsics. This has the<br>

                  advantage that most optimizations never trigger since

                  they don't<br>

                  even recognize those new intrinsics. Also, the

                  intrinsics can<br>

                  be marked as having side-effects and/or being

                  non-speculatable.</font><br>

                <br>

                <font size="2">The overall effect is that more

                  optimizations are suppressed<br>

                  than would be strictly necessary. But this may still

                  be a good<br>

                  first step, since the result is now safe but maybe not

                  optimal<br>

                  -- which can be improved upon over time by teaching

                  the specific<br>

                  semantics of those intrinsics to optimization passes.</font><br>

                <br>

                <font size="2">However, some open questions remain. If

                  at some point we want<br>

                  to model the constrained FP semantics more precisely

                  than just<br>

                  as "unmodeled side effects", this may have to be

                  reflected at<br>

                  the IR level directly. For example, to model rounding

                  mode<br>

                  behavior, at some point we might require explicit

                  tracking of<br>

                  data dependencies on the rounding mode by representing

                  the<br>

                  rounding mode as SSA values defined by function calls

                  and used<br>

                  by FP intrinsics. Similarly, to track exception status

                  flags,<br>

                  they might be modeled as SSA values set by FP

                  intrinsics and<br>

                  used by function calls.</font><br>

                <br>

                <font size="2">(There is a possibly related question of

                  how to optimally model<br>

                  the property of many math library routines that they

                  may access<br>

                  the "errno" variable but no other memory ... It might

                  also be<br>

                  possible to model e.g. exception status as a

                  thread-local "memory"<br>

                  location that is modified by FP operations, just like

                  errno.)</font><br>

                <br>

                <font size="2">Another currently unresolved issue is

                  that at the moment nothing<br>

                  prevents *standard* floating-point operations from

                  being moved<br>

                  *inside* FENV_ACCESS regions. This may also be

                  invalid, since<br>

                  those operations now may cause unexpected traps etc.

                  (More<br>

                  specifically, what is invalid is moving any standard

                  FP operation<br>

                  across a *call site* within a FENV_ACCESS region.)

                  Note that<br>

                  this is even an issue if we only support changing the

                  default<br>

                  (and no actual #pragma) if mutiple object files using

                  different<br>

                  default settings are being linked together using LTO.</font><br>

                <br>

                <font size="2">This last issue could in theory be solved

                  by having all optimization<br>

                  passes respect the requirement that floating-point

                  operations may<br>

                  not be moved across call sites marked with the strict

                  FP attribute.<br>

                  But that does not appear to be straightforward since

                  it would<br>

                  introduce a "new" type of dependeny that would have to

                  be added<br>

                  throughout LLVM code. If this must be avoided, we'd

                  have to<br>

                  find a way to explicity track dependencies at the IR

                  level. In<br>

                  the extreme, this could end up equivalent to just

                  always using<br>

                  the constrained intrinsics for everything ...</font><br>

                <br>

                <br>

                <font size="2">C. Code generation</font><br>

                <br>

                <font size="2">In the back end, effects of strict FP

                  mode have to passed through<br>

                  to lower-level representations including SelectionDAG

                  and MI.</font><br>

                <br>

                <font size="2">Currently, the "unmodeled side effect"

                  logic of the constrained<br>

                  intrinsics is modeled by putting them on the chain

                  during SelectionDAG.<br>

                  (If we ever model semantics more precisely at the IR

                  level, that<br>

                  would need to be reflected on SelectionDAG

                  accordingly.)</font><br>

                <br>

                <font size="2">At the MI level, there is no

                  representation at all. One option to<br>

                  fix this would be to model target-specific registers

                  that implement<br>

                  the IEEE semantics. Most platforms have registers (or

                  parts of<br>

                  registers) that hold:<br>

                  - the current rounding mode<br>

                  - the exception status flags<br>

                  - the exception masks (which enable traps)<br>

                  Marking FP instructions as using and/or defining these

                  registers<br>

                  would enforce ordering requirements. It may be too

                  strict in some<br>

                  cases (e.g. two instructions setting exception status

                  flags may<br>

                  still be reordered). On the other hand, I believe if

                  instructions<br>

                  may actually *trap*, we actually need the

                  hasSideEffects flag even<br>

                  if register dependencies are modeled.</font><br>

                <br>

                <font size="2">If we do need hasSideEffects, there is a

                  separate discussion on<br>

                  whether this can be implemented without each back end

                  having to<br>

                  duplicate all FP instruction patterns (one with

                  hasSideEffects<br>

                  and one without), e.g. by having a new feature that

                  allows to<br>

                  describe the side-effect status using an MI operand.</font><br>

                <br>

                <br>

                <font size="2">Next steps<br>

                  ==========</font><br>

                <br>

                <font size="2">I believe it is important to break up the

                  full amount of work<br>

                  into incremental steps that provide some useful

                  benefits on their<br>

                  own. At first, we should be able to get to a state

                  where clang<br>

                  can be used to build programs that use some (maybe not

                  all) strict<br>

                  FP features, where the generated code is always

                  correct but may<br>

                  not always be optimal. To get there, I think we need

                  at a <br>

                  minimum:</font><br>

                <br>

                <font size="2">- Implement clang support for the default

                  flags, e.g. GCC's<br>

                  -frounding-math and -ftrapping-math, and generate

                  always<br>

                  the constrained intrinsics. clang should also mark all<br>

                  call sites then (as mentioned above).</font><br>

                <br>

                <font size="2">- For now, add the requirement that LTO

                  is not supported if<br>

                  this would cause mixing of strict and non-strict FP

                  code.<br>

                  In the alternative, have the LTO pass automatically

                  transform<br>

                  and floating-point operation into a constrained

                  intrinsic<br>

                  if *any* (other) module already uses the latter.</font><br>

                <br>

                <font size="2">- At the IR level, complete the set of

                  supported constrained<br>

                  FP intrinsics (there are still some missing, see e.g <br>

                </font><font size="2"><a

                    href="https://reviews.llvm.org/D43515"

                    target="_blank" moz-do-not-send="true">https://reviews.llvm.org/<wbr>D43515</a></font><font

                  size="2">).<br>

                  Also, it seems not all variants (e.g. for vector

                  types) are<br>

                  supported correctly through codegen (see e.g.<br>

                </font><font size="2"><a

                    href="https://reviews.llvm.org/D46967"

                    target="_blank" moz-do-not-send="true">https://reviews.llvm.org/<wbr>D46967</a></font><font

                  size="2">).</font><br>

                <br>

                <font size="2">- Allow targets to correctly reflect

                  constrained intrinsics<br>

                  semantics at the MI level and final machine code

                  generation<br>

                  (see e.g. </font><font size="2"><a

                    href="https://reviews.llvm.org/D45576"

                    target="_blank" moz-do-not-send="true">https://reviews.llvm.org/<wbr>D45576</a></font><font

                  size="2">).</font><br>

                <br>

                <font size="2">- Review all optimization and codegen

                  passes to verify they<br>

                  fully respect strict FP semantics.</font><br>

                <br>

                <font size="2">Once this is done, we can improve on the

                  solution by:</font><br>

                <br>

                <font size="2">- Supporting mixing strict and non-strict

                  FP operations<br>

                  (would lift the LTO restriction). (Note: there seems<br>

                  to be still some "invention required" here, see

                  above.)</font><br>

                <br>

                <font size="2">- Actually implementing the #pragma

                  supporting different<br>

                  regions within a compilation unit (prereq: support for<br>

                  mixing strict and non-strict FP operations).</font><br>

                <br>

                <font size="2">- Add more optimization of constrained FP

                  intrinsics in<br>

                  common optimizers and/or target back ends.</font><br>

                <br>

                <font size="2">Does this look reasonable? Please let me

                  know if there's<br>

                  anything I overlooked, or you have any additional

                  comments<br>

                  or questions.</font><br>

                <br>

                <br>

                <font size="2"><br>

                  Mit freundlichen Gruessen / Best Regards<span

                    class="HOEnZb"><font color="#888888"><br>

                      <br>

                      Ulrich Weigand<br>

                      <br>

                      -- <br>

                      Dr. Ulrich Weigand | Phone: +49-7031/16-3727<br>

                      STSM, GNU/Linux compilers and toolchain<br>

                      IBM Deutschland Research & Development GmbH<br>

                      Vorsitzende des Aufsichtsrats: Martina Koederitz |

                      Geschäftsführung: Dirk Wittkopp<br>

                      Sitz der Gesellschaft: Böblingen |

                      Registergericht: Amtsgericht Stuttgart, HRB 243294</font></span></font><br>

              </p>

            </div>

            <br>

            ______________________________<wbr>_________________<br>

            LLVM Developers mailing list<br>

            <a href="mailto:llvm-dev@lists.llvm.org"

              moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>

            <a

              href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"

              rel="noreferrer" target="_blank" moz-do-not-send="true">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

            <br>

          </blockquote>

        </div>

        <br>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

LLVM Developers mailing list

<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>

<a class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>

</pre>

    </blockquote>

    <br>

    <pre class="moz-signature" cols="72">-- 

Hal Finkel

Lead, Compiler Technology and Programming Languages

Leadership Computing Facility

Argonne National Laboratory</pre>

  </body>

</html>