<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, May 23, 2018 at 12:19 PM, Hal Finkel <span dir="ltr"><<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

  <div bgcolor="#FFFFFF"><span class="gmail-">

    <p><br>

    </p>

    <div class="gmail-m_-1433965244057454815moz-cite-prefix">On 05/23/2018 11:06 AM, Hubert Tong via

      llvm-dev wrote:<br>

    </div>

    <blockquote type="cite">

      <div dir="ltr">

        <div>Hi Ulrich,<br>

          <br>

        </div>

        <div>I am interested in knowing if the current proposals also

          take into account the FP_CONTRACT pragma</div>

      </div>

    </blockquote>

    <br></span>

    We should already do this (we turn relevant operations into the

    @llvm.fmuladd. when FP_CONTRACT is set to on during IR generation).<span class="gmail-"><br></span></div></blockquote><div>I am not sure we have the same interpretation of what the FP_CONTRACT pragma does. Subclause 6.5 paragraph 8 of C11 implies (for example) that even where the FENV_ACCESS pragma is "on", folding a constant subexpression with an exactly representable result on an implementation where FLT_EVAL_METHOD is 0 is within the range of acceptable implementation-defined behaviour despite intermediate overflow under non-contracted evaluation. Which is to say that the current proposal reads as what needs to be done when FP_CONTRACT is "off" and FENV_ACCESS is "on". The note from Ulrich implies that the requirements are imposed by the Standard, but the range of implementation defined behaviour where FP_CONTRACT is "on" where FENV_ACCESS is also "on" is possibly a discussion to be had.<br><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF"><span class="gmail-">

    <br>

    <blockquote type="cite">

      <div dir="ltr">

        <div> and the ability to implement options that imply a specific

          value for the FLT_EVAL_METHOD macro.<br>

        </div>

      </div>

    </blockquote>

    <br></span>

    What do you mean by this?<br></div></blockquote><div>I admit that modes where FLT_EVAL_METHOD, respectively, is 0 (no extra range and precision), 1 (float in double range and precision), and 2 (float and double in long double range and precision) are all straightforward for the IR producer to implement by fixing the types used in the IR emitted (implying the value FLT_EVAL_METHOD is not constant within a program).<br><br>So, this is more about implementing meaningful cases of FLT_EVAL_METHOD being -1. My point below (in my previous note) is that allowing IR passes or the back-end to choose the range and precision  in a manner conforming to Standard C (for a FLT_EVAL_METHOD of -1)--perhaps for speed where multiple sets of floating-point operations/registers are available with differing "preferred types"--appears to be a use case that the IR does not seem to support well. As for why a FLT_EVAL_METHOD of -1 is on-topic for this thread: The language semantics allow the case of the constant subexpression folding I mentioned above even when FP_CONTRACT is "off" and FENV_ACCESS is "on", because the evaluation format used for the evaluation of that subexpression can be said to have infinite range and precision.<br><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF">

    <br>

     -Hal<div><div class="gmail-h5"><br>

    <br>

    <blockquote type="cite">

      <div dir="ltr">

        <div><br>

        </div>

        <div>Additionally, I am not aware of the IR being able to

          represent the potentially deferred loss of precision that the

          C language semantics provide; in particular, applying such

          semantics to the existing IR would hit an issue that the

          limits of such deferment would need an agreed representation.<br>

          <br>

        </div>

        <div>As for the mixing of strict and non-strict modes, I would

          be interested in where LLVM is in its handling of non-SSA

          (pseudo-memory?) dependencies. I have a vague impression that

          it is very coarse-grained in that respect, but I admit to not

          being particularly informed in that space. If there is a good

          model for such dependencies, then I think it could be used to

          handle the strict/non-strict mixing.<br>

        </div>

        <div><br>

        </div>

        -- Hubert Tong, IBM<br>

        <br>

        <div>PS A nitpick on wording: The idea of being inside or

          outside of FENV_ACCESS regions is instead be expressed in

          terms of the state of the FENV_ACCESS pragma within the C

          Standard.<br>

        </div>

      </div>

      <div class="gmail_extra"><br>

        <div class="gmail_quote">On Wed, May 23, 2018 at 10:48 AM,

          Ulrich Weigand via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span>

          wrote:<br>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

            <div>

              <p><font size="2">Hello,</font><br>

                <br>

                <font size="2">at the recent EuroLLVM developer meeting

                  in Bristol I held a BoF<br>

                  session on the topic "Towards implementing #pragma

                  STDC FENV_ACCESS".<br>

                  I've also had a number of follow-on discussions both

                  on-site in<br>

                  Bristol and online since. This post is intended as a

                  summary of<br>

                  my current understanding set of requirements and

                  implementation<br>

                  details covering the overall topic.</font><br>

                <br>

                <font size="2">I'm posting this here in the hope this

                  can serve as a basis for<br>

                  the various more detailed discussions that are still

                  ongoing<br>

                  (e.g. in various Phabricator proposals right now). Any

                  comments<br>

                  are welcome!</font><br>

                <br>

                <br>

                <font size="2">Semantics of #pragma STDC FENV_ACCESS<br>

                  ==============================<wbr>=======</font><br>

                <br>

                <font size="2">To provide a baseline for the

                  implementation discussion, first an<br>

                  overview of the features required to handle the strict

                  floating-point<br>

                  mode defined by the C and IEEE standard:</font><br>

                <br>

                <font size="2">1. Floating-point rounding modes<br>

                  2. Default floating-point exception handling<br>

                  3. Trapping floating-point exception handling</font><br>

                <br>

                <font size="2">Each of these separate features imposes

                  different constraints on the<br>

                  optimizations that LLVM may perform involving FP

                  expressions:</font><br>

                <br>

                <font size="2">1. Floating-point rounding modes</font><br>

                <br>

                <font size="2">Outside of FENV_ACCESS regions, all FP

                  operations are supposed to be<br>

                  performed in the "default" rounding mode.</font><br>

                <br>

                <font size="2">But inside FENV_ACCESS regions, FP

                  operations implicitly depend on<br>

                  a "current" rounding mode setting, which may be

                  changed by certain<br>

                  C library calls (plus some platform-specific

                  intrinsics). In addition,<br>

                  those calls may be performed within subroutines (as

                  long as those are<br>

                  also within FENV_ACCESS), so *any* function call

                  within a FENV_ACCESS<br>

                  must be considered as potentially changing the

                  rounding mode.</font><br>

                <br>

                <font size="2">In effect, this means the compiler may

                  not move or combine FP<br>

                  operations accross function call sites.</font><br>

                <br>

                <font size="2">2. Default floating-point exception

                  handling</font><br>

                <br>

                <font size="2">Inside FENV_ACCESS regions, every

                  floating-point operation that<br>

                  causes an exception must be considered to set a

                  "status flag"<br>

                  associated with this exception type. Those flags can

                  be queried<br>

                  using C library calls (plus some platform-specific

                  intrinsics),<br>

                  and there are other such calls to explicitly set or

                  clear those<br>

                  flags as well. As with the rounding modes, those calls

                  may be<br>

                  performed in subroutines as well, so any function call

                  within a<br>

                  FENV_ACCESS region must be considered as potentially

                  *using* and<br>

                  changing the floating-point exception status flags.</font><br>

                <br>

                <font size="2">The values of the status flags on entry

                  to a FENV_ACCESS are to<br>

                  be considered undefined according to the C standard.</font><br>

                <br>

                <font size="2">Compiler optimizations are supposed to

                  preserve the values of<br>

                  all exception status bits at any point where they can

                  be<br>

                  (potentially) inspected by the program, i.e. at all

                  call sites<br>

                  within FENV_ACCESS regions. This still allows a number

                  of<br>

                  optimizations, e.g. to reorder FP operations or

                  combine two<br>

                  identical operations within a region uninterrupted by

                  calls.<br>

                  But other optimizations should be avoided, e.g.

                  optimizing<br>

                  away an unused FP operation may result in an exception

                  flag<br>

                  now being unset that would otherwise have been set.

                  The same<br>

                  applies to floating-point constant folding.</font><br>

                <br>

                <font size="2">3. Trapping floating-point exception

                  handling</font><br>

                <br>

                <font size="2">Within a FENV_ACCESS region, library

                  calls may be used to switch<br>

                  exception handling semantics to a "trapping" mode by

                  setting<br>

                  corresponding mask bits. Any subsequent FP instruction

                  that<br>

                  raises an exception with the associated mask bit set

                  will cause<br>

                  a trap. Usually, this will be a hardware trap that is

                  translated<br>

                  by the operating system into some form of software

                  exception that<br>

                  can by handled by the applcation; on Linux systems

                  this takes the<br>

                  form of a SIGFPE signal.</font><br>

                <br>

                <font size="2">As above, those mask bits can be set and

                  reset via (operating-<br>

                  system specific) library calls and/or

                  platform-specific intrinsics,<br>

                  all of which may also be done within subroutine calls.</font><br>

                <br>

                <font size="2">In effect, this requires the compiler to

                  treat any floating-point<br>

                  operation within a FENV_ACCESS region as potentially

                  trapping,<br>

                  which means the same restrictions apply as with e.g.

                  memory accesses<br>

                  (cannot be speculated etc.) However, according to the

                  C standard,<br>

                  the implementation is not required to preserve the

                  *number* of<br>

                  different traps, so identical operations may still be

                  combined<br>

                  (unless there is an intervening function call).</font><br>

                <br>

                <font size="2">The C standard requires all user code to

                  explicitly switch back<br>

                  to non-trapping mode for all exceptions whenever

                  leaving a<br>

                  FENV_ACCESS region (both by "falling off the end" of

                  the region<br>

                  and by calling a subroutine defined outside of

                  FENV_ACCESS).</font><br>

                <br>

                <br>

                <font size="2">Implementation requirements on parts of

                  the compiler<br>

                  ==============================<wbr>======================</font><br>

                <br>

                <font size="2">A. clang front end</font><br>

                <br>

                <font size="2">The front end needs to determine which

                  instructions are part of<br>

                  FENV_ACCESS regions and which are not. This takes into

                  account<br>

                  both the semantics of the #pragma as defined by the

                  standard,<br>

                  and the implementation-defined default rules that

                  apply to code<br>

                  outside of any #pragma. GCC currently has the

                  following two<br>

                  related command-line options:</font><br>

                <br>

                <font size="2">-frounding-math: Do not assume default

                  rounding mode<br>

                  -ftrapping-math: Assume FP operations may trap</font><br>

                <br>

                <font size="2">clang accepts but (basically) ignores

                  those options. As a first<br>

                  step, it might make sense to have the FENV_ACCESS

                  default</font><br>

                <font size="2">behavior triggered by these options, even

                  while the front end<br>

                  does not yet support the actual #pragma.</font><br>

                <br>

                <font size="2">The front end then needs to transmit the

                  information about<br>

                  FENV_ACCESS regions to later passes. However, I

                  believe that<br>

                  we do not actually have to implement "regions" as such

                  at the<br>

                  IR level. Instead, it would be sufficient to track the

                  follwing<br>

                  information:</font><br>

                <br>

                <font size="2">- For each FP operation, whether it is

                  within a FENV_ACCESS region.<br>

                  - For each call site, whether it is within a

                  FENV_ACCESS region.</font><br>

                <br>

                <font size="2">The former requires new IR support; the

                  approach currently under<br>

                  investigation uses the experimental "constrained FP"

                  intrinsics<br>

                  instead of traditional floating-point operations for

                  this. The<br>

                  latter can be done simply by annotating those call

                  sites with an<br>

                  attribute.</font><br>

                <br>

                <font size="2">In addition to that, the front-end itself

                  needs to disable any<br>

                  early optimizations that do not preserve strict FP

                  semantics,<br>

                  in particular it must not speculate FP operations if

                  they may<br>

                  trap. (Currently, the front end transforms "? :" on

                  floating-<br>

                  point types into a select IR statement; for trapping

                  FP<br>

                  operations, an explicit branch must be used instead.)</font><br>

                <br>

                <br>

                <font size="2">B. LLVM IR and LLVM common optimizations</font><br>

                <br>

                <font size="2">As mentioned in the previous section, we

                  need some IR to annotate<br>

                  FP instructions and call sites within FENV_ACCESS

                  regions. All<br>

                  common optimizations then need to respect the strict

                  FP semantics<br>

                  associated with those regions.</font><br>

                <br>

                <font size="2">The current approach uses experimental

                  intrinsics. This has the<br>

                  advantage that most optimizations never trigger since

                  they don't<br>

                  even recognize those new intrinsics. Also, the

                  intrinsics can<br>

                  be marked as having side-effects and/or being

                  non-speculatable.</font><br>

                <br>

                <font size="2">The overall effect is that more

                  optimizations are suppressed<br>

                  than would be strictly necessary. But this may still

                  be a good<br>

                  first step, since the result is now safe but maybe not

                  optimal<br>

                  -- which can be improved upon over time by teaching

                  the specific<br>

                  semantics of those intrinsics to optimization passes.</font><br>

                <br>

                <font size="2">However, some open questions remain. If

                  at some point we want<br>

                  to model the constrained FP semantics more precisely

                  than just<br>

                  as "unmodeled side effects", this may have to be

                  reflected at<br>

                  the IR level directly. For example, to model rounding

                  mode<br>

                  behavior, at some point we might require explicit

                  tracking of<br>

                  data dependencies on the rounding mode by representing

                  the<br>

                  rounding mode as SSA values defined by function calls

                  and used<br>

                  by FP intrinsics. Similarly, to track exception status

                  flags,<br>

                  they might be modeled as SSA values set by FP

                  intrinsics and<br>

                  used by function calls.</font><br>

                <br>

                <font size="2">(There is a possibly related question of

                  how to optimally model<br>

                  the property of many math library routines that they

                  may access<br>

                  the "errno" variable but no other memory ... It might

                  also be<br>

                  possible to model e.g. exception status as a

                  thread-local "memory"<br>

                  location that is modified by FP operations, just like

                  errno.)</font><br>

                <br>

                <font size="2">Another currently unresolved issue is

                  that at the moment nothing<br>

                  prevents *standard* floating-point operations from

                  being moved<br>

                  *inside* FENV_ACCESS regions. This may also be

                  invalid, since<br>

                  those operations now may cause unexpected traps etc.

                  (More<br>

                  specifically, what is invalid is moving any standard

                  FP operation<br>

                  across a *call site* within a FENV_ACCESS region.)

                  Note that<br>

                  this is even an issue if we only support changing the

                  default<br>

                  (and no actual #pragma) if mutiple object files using

                  different<br>

                  default settings are being linked together using LTO.</font><br>

                <br>

                <font size="2">This last issue could in theory be solved

                  by having all optimization<br>

                  passes respect the requirement that floating-point

                  operations may<br>

                  not be moved across call sites marked with the strict

                  FP attribute.<br>

                  But that does not appear to be straightforward since

                  it would<br>

                  introduce a "new" type of dependeny that would have to

                  be added<br>

                  throughout LLVM code. If this must be avoided, we'd

                  have to<br>

                  find a way to explicity track dependencies at the IR

                  level. In<br>

                  the extreme, this could end up equivalent to just

                  always using<br>

                  the constrained intrinsics for everything ...</font><br>

                <br>

                <br>

                <font size="2">C. Code generation</font><br>

                <br>

                <font size="2">In the back end, effects of strict FP

                  mode have to passed through<br>

                  to lower-level representations including SelectionDAG

                  and MI.</font><br>

                <br>

                <font size="2">Currently, the "unmodeled side effect"

                  logic of the constrained<br>

                  intrinsics is modeled by putting them on the chain

                  during SelectionDAG.<br>

                  (If we ever model semantics more precisely at the IR

                  level, that<br>

                  would need to be reflected on SelectionDAG

                  accordingly.)</font><br>

                <br>

                <font size="2">At the MI level, there is no

                  representation at all. One option to<br>

                  fix this would be to model target-specific registers

                  that implement<br>

                  the IEEE semantics. Most platforms have registers (or

                  parts of<br>

                  registers) that hold:<br>

                  - the current rounding mode<br>

                  - the exception status flags<br>

                  - the exception masks (which enable traps)<br>

                  Marking FP instructions as using and/or defining these

                  registers<br>

                  would enforce ordering requirements. It may be too

                  strict in some<br>

                  cases (e.g. two instructions setting exception status

                  flags may<br>

                  still be reordered). On the other hand, I believe if

                  instructions<br>

                  may actually *trap*, we actually need the

                  hasSideEffects flag even<br>

                  if register dependencies are modeled.</font><br>

                <br>

                <font size="2">If we do need hasSideEffects, there is a

                  separate discussion on<br>

                  whether this can be implemented without each back end

                  having to<br>

                  duplicate all FP instruction patterns (one with

                  hasSideEffects<br>

                  and one without), e.g. by having a new feature that

                  allows to<br>

                  describe the side-effect status using an MI operand.</font><br>

                <br>

                <br>

                <font size="2">Next steps<br>

                  ==========</font><br>

                <br>

                <font size="2">I believe it is important to break up the

                  full amount of work<br>

                  into incremental steps that provide some useful

                  benefits on their<br>

                  own. At first, we should be able to get to a state

                  where clang<br>

                  can be used to build programs that use some (maybe not

                  all) strict<br>

                  FP features, where the generated code is always

                  correct but may<br>

                  not always be optimal. To get there, I think we need

                  at a <br>

                  minimum:</font><br>

                <br>

                <font size="2">- Implement clang support for the default

                  flags, e.g. GCC's<br>

                  -frounding-math and -ftrapping-math, and generate

                  always<br>

                  the constrained intrinsics. clang should also mark all<br>

                  call sites then (as mentioned above).</font><br>

                <br>

                <font size="2">- For now, add the requirement that LTO

                  is not supported if<br>

                  this would cause mixing of strict and non-strict FP

                  code.<br>

                  In the alternative, have the LTO pass automatically

                  transform<br>

                  and floating-point operation into a constrained

                  intrinsic<br>

                  if *any* (other) module already uses the latter.</font><br>

                <br>

                <font size="2">- At the IR level, complete the set of

                  supported constrained<br>

                  FP intrinsics (there are still some missing, see e.g <br>

                </font><font size="2"><a href="https://reviews.llvm.org/D43515" target="_blank">https://reviews.llvm.org/D4351<wbr>5</a></font><font size="2">).<br>

                  Also, it seems not all variants (e.g. for vector

                  types) are<br>

                  supported correctly through codegen (see e.g.<br>

                </font><font size="2"><a href="https://reviews.llvm.org/D46967" target="_blank">https://reviews.llvm.org/D4696<wbr>7</a></font><font size="2">).</font><br>

                <br>

                <font size="2">- Allow targets to correctly reflect

                  constrained intrinsics<br>

                  semantics at the MI level and final machine code

                  generation<br>

                  (see e.g. </font><font size="2"><a href="https://reviews.llvm.org/D45576" target="_blank">https://reviews.llvm.org/D4557<wbr>6</a></font><font size="2">).</font><br>

                <br>

                <font size="2">- Review all optimization and codegen

                  passes to verify they<br>

                  fully respect strict FP semantics.</font><br>

                <br>

                <font size="2">Once this is done, we can improve on the

                  solution by:</font><br>

                <br>

                <font size="2">- Supporting mixing strict and non-strict

                  FP operations<br>

                  (would lift the LTO restriction). (Note: there seems<br>

                  to be still some "invention required" here, see

                  above.)</font><br>

                <br>

                <font size="2">- Actually implementing the #pragma

                  supporting different<br>

                  regions within a compilation unit (prereq: support for<br>

                  mixing strict and non-strict FP operations).</font><br>

                <br>

                <font size="2">- Add more optimization of constrained FP

                  intrinsics in<br>

                  common optimizers and/or target back ends.</font><br>

                <br>

                <font size="2">Does this look reasonable? Please let me

                  know if there's<br>

                  anything I overlooked, or you have any additional

                  comments<br>

                  or questions.</font><br>

                <br>

                <br>

                <font size="2"><br>

                  Mit freundlichen Gruessen / Best Regards<span class="gmail-m_-1433965244057454815HOEnZb"><font color="#888888"><br>

                      <br>

                      Ulrich Weigand<br>

                      <br>

                      -- <br>

                      Dr. Ulrich Weigand | Phone: +49-7031/16-3727<br>

                      STSM, GNU/Linux compilers and toolchain<br>

                      IBM Deutschland Research & Development GmbH<br>

                      Vorsitzende des Aufsichtsrats: Martina Koederitz |

                      Geschäftsführung: Dirk Wittkopp<br>

                      Sitz der Gesellschaft: Böblingen |

                      Registergericht: Amtsgericht Stuttgart, HRB 243294</font></span></font><br>

              </p>

            </div>

            <br>

            ______________________________<wbr>_________________<br>

            LLVM Developers mailing list<br>

            <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

            <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

            <br>

          </blockquote>

        </div>

        <br>

      </div>

      <br>

      <fieldset class="gmail-m_-1433965244057454815mimeAttachmentHeader"></fieldset>

      <br>

      <pre>______________________________<wbr>_________________

LLVM Developers mailing list

<a class="gmail-m_-1433965244057454815moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>

<a class="gmail-m_-1433965244057454815moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a>

</pre>

    </blockquote>

    <br>

    </div></div><span class="gmail-HOEnZb"><font color="#888888"><pre class="gmail-m_-1433965244057454815moz-signature" cols="72">-- 

Hal Finkel

Lead, Compiler Technology and Programming Languages

Leadership Computing Facility

Argonne National Laboratory</pre>

  </font></span></div>

</blockquote></div><br></div></div>