<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <p><br>
    </p>
    <br>
    <div class="moz-cite-prefix">On 10/02/2017 11:10 AM, Bruce Hoult via
      llvm-dev wrote:<br>
    </div>
    <blockquote
cite="mid:CAMU+Ekwm2tUVRPHZbBjBg6OCODTHfc46VDiQwfAU6=CLabUa6Q@mail.gmail.com"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <div dir="ltr">Is there anything that means, in particular, "go
        fast, even if it means not all bits are significant"?
        <div><br>
        </div>
        <div>I'm currently working on an llvm-based compiler for a GPU
          that is optomised for OpenGL, where 16 bit FP may not be quite
          accurate enough (or may be in some cases), but 32 bit FP is
          overkill. A lot of the fast, built in, operations end up with
          a few junk bits at the end (not add/sub/mul . but divide is
          available *only* using reciprocal).</div>
        <div><br>
        </div>
        <div>When implementing OpenCL, the specs and conformance tests
          require full IEEE accuracy. In some cases this requires a
          round of Newton-Raphson to clean up the accuracy, which is a
          significant though maybe not crippling performance penalty.
          But in other cases we need to do a lot of range reduction,
          some polynomial, and then generalise the result again. This
          can be an order of magnitude or more slower than using the
          not-quite-accurate-enough built in instruction.</div>
      </div>
    </blockquote>
    <br>
    This is what arcp is for (implying that you can use the reciprocal
    estimate and not worry about getting the exact answer). Now there's
    a separate question about how many Newton iterations to use, and we
    have a separate flag for that (-mrecip=...). Check out the
    implementation of  TargetLoweringBase::getRecipEstimateSqrtEnabled
    to see how it's setup in backend. This is, however, per function, so
    we don't currently have a per-operation control on this.<br>
    <br>
    <blockquote
cite="mid:CAMU+Ekwm2tUVRPHZbBjBg6OCODTHfc46VDiQwfAU6=CLabUa6Q@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div><br>
        </div>
        <div>The OpenCL spec defines a number of compile flags
          controlling optimizartions. Some seem to map well onto the
          flags already discussed here:</div>
        <div><br>
        </div>
        <div>-cl-mad-enable<br>
        </div>
        <div>-cl-no-signed-zeros<br>
        </div>
        <div>-cl-finite-math-only<br>
        </div>
        <div><br>
        </div>
        <div>However it looks to me that the following ones don't
          presently map well to LLVM:</div>
        <div><br>
        </div>
        <div>
          <div>-cl-unsafe-math-optimizations</div>
          <div>Allow optimizations for floating-point arithmetic that
            (a) assume that arguments and results are valid, (b) may
            violate IEEE 754 standard and (c) may violate the OpenCL
            numerical compliance requirements as defined in the SPIR-V
            OpenCL environment specification for single precision and
            double precision floating-point, and edge case behavior in
            the SPIR-V OpenCL environment specification. This option
            includes the -clno-signed-zeros and -cl-mad-enable options.</div>
        </div>
      </div>
    </blockquote>
    <br>
    I think the idea is that this flag, like
    -funsafe-math-optimizations, gets mapped to an appropriate
    collection of finer-grained flags internally.<br>
    <br>
    <blockquote
cite="mid:CAMU+Ekwm2tUVRPHZbBjBg6OCODTHfc46VDiQwfAU6=CLabUa6Q@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div><br>
        </div>
        <div>
          <div>-cl-fast-relaxed-math</div>
          <div>Sets the optimization options -cl-finite-math-only and
            -cl-unsafe-math-optimizations. This allows optimizations for
            floating-point arithmetic that may violate the IEEE 754
            standard and the OpenCL numerical compliance requirements
            for single precision and double precision floating-point, as
            well as floating point edge case behavior. This option also
            relaxes the precision of commonly used math functions. This
            option causes the preprocessor macro __FAST_RELAXED_MATH__
            to be defined in the OpenCL program. The original and
            modified values are defined in the SPIR-V OpenCL environment
            specification</div>
        </div>
        <div><br>
        </div>
        <div>I'd like to emphasise in the latter one: "This option also
          relaxes the precision of commonly used math functions."</div>
      </div>
    </blockquote>
    <br>
    Isn't this the "libm" flag that is proposed in this thread?<br>
    <br>
     -Hal<br>
    <br>
    <blockquote
cite="mid:CAMU+Ekwm2tUVRPHZbBjBg6OCODTHfc46VDiQwfAU6=CLabUa6Q@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div><br>
        </div>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">On Mon, Oct 2, 2017 at 4:45 PM, Ristow,
          Warren via llvm-dev <span dir="ltr"><<a
              moz-do-not-send="true"
              href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div link="blue" vlink="purple" lang="EN-US">
              <div class="m_-6162699180708653109WordSection1">
                <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#44546a">I'm
                    not aware of any additional bits needed.  But
                    putting us right at the edge leaves me
                    uncomfortable.  So an implementation that isn't
                    limited by the 7 bits in SubclassOptionalData seems
                    sensible.</span></p>
                <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#44546a"> </span></p>
                <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#44546a">Thanks,</span></p>
                <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#44546a">-Warren</span></p>
                <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#44546a"> </span></p>
                <div style="border:none;border-top:solid #b5c4df
                  1.0pt;padding:3.0pt 0in 0in 0in">
                  <p class="MsoNormal"><b><span
style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span
style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">
                      Sanjay Patel [mailto:<a moz-do-not-send="true"
                        href="mailto:spatel@rotateright.com"
                        target="_blank">spatel@rotateright.com</a><wbr>]
                      <br>
                      <b>Sent:</b> Monday, October 2, 2017 12:06 AM<br>
                      <b>To:</b> Ristow, Warren<br>
                      <b>Cc:</b> Hal Finkel; <a moz-do-not-send="true"
                        href="mailto:llvm-dev@lists.llvm.org"
                        target="_blank">llvm-dev@lists.llvm.org</a><br>
                      <b>Subject:</b> Re: [llvm-dev] Trouble when
                      suppressing a portion of fast-math-transformations</span></p>
                </div>
                <div>
                  <div class="h5">
                    <p class="MsoNormal"> </p>
                    <div>
                      <div>
                        <p class="MsoNormal">Are we confident that we
                          just need those 7 bits to represent all of the
                          relaxed FP states that we need/want to
                          support?
                        </p>
                      </div>
                      <div>
                        <p class="MsoNormal"> </p>
                      </div>
                      <div>
                        <p class="MsoNormal"
                          style="margin-bottom:12.0pt">I'm asking
                          because FMF in IR is currently mapped onto the
                          SubclassOptionalData of Value...and we have
                          exactly 7 bits there. :)</p>
                      </div>
                      <p class="MsoNormal">If we're redoing the
                        definitions, I'm wondering if we can share the
                        struct with the backend's SDNodeFlags, but that
                        already has one extra bit for vector reduction.
                        Should we give up on SubclassOptionalData for
                        FMF? We have a MD_fpmath enum value for
                        metadata, so we could move things over there?</p>
                      <div>
                        <p class="MsoNormal"> </p>
                      </div>
                    </div>
                    <div>
                      <p class="MsoNormal"> </p>
                      <div>
                        <p class="MsoNormal">On Fri, Sep 29, 2017 at
                          8:16 PM, Ristow, Warren via llvm-dev <<a
                            moz-do-not-send="true"
                            href="mailto:llvm-dev@lists.llvm.org"
                            target="_blank">llvm-dev@lists.llvm.org</a>>
                          wrote:</p>
                        <p class="MsoNormal">Hi Hal,<br>
                          <br>
                          >> 4. To fix this, I think that
                          additional fast-math-flags are likely<br>
                          >> needed in the IR.  Instead of the
                          following set:<br>
                          >><br>
                          >> 'nnan' + 'ninf' + 'nsz' + 'arcp' +
                          'contract'<br>
                          >><br>
                          >> something like this:<br>
                          >><br>
                          >> 'reassoc' + 'libm' + 'nnan' + 'ninf'
                          + 'nsz' + 'arcp' + 'contract'<br>
                          >><br>
                          >> would be more useful.  Related to
                          this, the current 'fast' flag which acts<br>
                          >> as an umbrella (enabling 'nnan' +
                          'ninf' + 'nsz' + 'arcp' + 'contract') may<br>
                          >> not be needed.  A discussion on this
                          point was raised last November on the<br>
                          >> mailing list:<br>
                          >><br>
                          >> <a moz-do-not-send="true"
href="http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html"
                            target="_blank">
                            http://lists.llvm.org/<wbr>pipermail/llvm-dev/2016-<wbr>November/107104.html</a><br>
                          ><br>
                          > I agree. I'm happy to help review the
                          patches. It will be best to have<br>
                          > only the finer-grained flags where
                          there's no "fast" flag that implies<br>
                          > all of the others.<br>
                          <br>
                          Thanks for the quick response, and for the
                          willingness to review.  I won't let<br>
                          this languish so long, like the post from last
                          November.<br>
                          <br>
                          Happy to hear that you feel it's best not to
                          have the umbrella "fast" flag.<br>
                          <br>
                          Thanks again,</p>
                        <div>
                          <div>
                            <p class="MsoNormal">-Warren<br>
                              ______________________________<wbr>_________________<br>
                              LLVM Developers mailing list<br>
                              <a moz-do-not-send="true"
                                href="mailto:llvm-dev@lists.llvm.org"
                                target="_blank">llvm-dev@lists.llvm.org</a><br>
                              <a moz-do-not-send="true"
                                href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"
                                target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a></p>
                          </div>
                        </div>
                      </div>
                      <p class="MsoNormal"> </p>
                    </div>
                  </div>
                </div>
              </div>
            </div>
            <br>
            ______________________________<wbr>_________________<br>
            LLVM Developers mailing list<br>
            <a moz-do-not-send="true"
              href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>
            <a moz-do-not-send="true"
              href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"
              rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
            <br>
          </blockquote>
        </div>
        <br>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
LLVM Developers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>
<a class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>
</pre>
    </blockquote>
    <br>
    <pre class="moz-signature" cols="72">-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
  </body>
</html>