<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <font size="-1">As for me, I lean for Sanjay's proposal and Sanjoy's
      #4, as both seem to me to be more future proof and enable
      mimicking the behavior of GCC more accurately.<br>
      <br>
      On another note, do y'all have any thoughts about changing the FP
      math semantics to FTZ and DAZ for the whole program, as some, if
      not all, current targets support such FP modes through bits in
      their FP unit control register, or similar?<br>
      <br>
      As Hal once pointed out to me, the way that GCC works is a bit
      unnerving, as any DSO that changes the FP mode to use such
      semantics affects all modules, even those which were written
      without this change in mind.  Perhaps emitting the initialization
      code to change the FP mode for DSOs might be suppressed, thus
      leaving this run time change in the hands of the program
      developer, not the library developer's.  Although this raises some
      questions as well.<br>
      <br>
      GCC accomplishes this in libgcc, whereas, should the same behavior
      be copied by LLVM, it would likely reside in compiler-rt.<br>
      <br>
      Cheers,<br>
    </font>
    <pre class="moz-signature" cols="72">-- 
Evandro Menezes
</pre>
    <div class="moz-cite-prefix">On 03/18/19 11:31, Sanjay Patel via
      llvm-dev wrote:<br>
    </div>
    <blockquote
cite="mid:CA+wODisqqS=Jo+Qg4dyWTu-xeeGPuvQk3DbOmQjVYso_8EJeXA@mail.gmail.com"
      type="cite">
      <meta http-equiv="Context-Type" content="text/html; charset=UTF-8">
      <div dir="ltr">
        <div dir="ltr">
          <div dir="ltr">
            <div dir="ltr">
              <div dir="ltr">
                <div dir="ltr">
                  <div dir="ltr">
                    <div>We knew the day when we needed another FMF bit
                      was coming back in:</div>
                    <div><a moz-do-not-send="true"
                        href="https://reviews.llvm.org/D39304"
                        rel="noreferrer" target="_blank">https://reviews.llvm.org/D39304</a></div>
                    <div>...it was just a question of 'when'. :)</div>
                    <div><br>
                    </div>
                    <div>I'm guessing that an FTZ bit won't be the last
                      new bit needed if we consider permutations between
                      strict FP and fast-math. Even without that,
                      denormals-as-zero (DAZ) might also be useful?<br>
                    </div>
                    <div>So rather than continuing to carve these out
                      bit-by-bit, it's worth considering a more general
                      solution: instruction-level metadata.</div>
                    <div><br>
                    </div>
                    <div>IIUC, the main argument for making FMF part of
                      the instruction was that per-instruction metadata
                      gets expensive if we're applying it to a
                      significant chunk of the instructions.</div>
                    <div>But let's think about that - even the most
                      FP-heavy code tops out around 10% FP math ops out
                      of the total instruction count. Typical FP
                      benchmark code is only 2-5% FP ops. The rest is
                      the same load/store/control-flow/ALU stuff found
                      in integer code.</div>
                    <div><br>
                    </div>
                    <div>I'm not exactly sure yet what it would take to
                      do the experiment, but it seems worth exploring
                      moving the existing FMF to metadata.<br>
                    </div>
                    <div><br>
                    </div>
                    <div>One point in favor of this approach is that we
                      already have an "MD_fpmath" enum. It's currently
                      only used to convey reduced precision requirements
                      to the AMDGPU backend. We could extend that to
                      include arbitrary FMF settings. <br>
                    </div>
                    <div><br>
                    </div>
                    <div>A couple of related points for FMF-as-metadata:</div>
                    <div>1. It might encourage fixing a hack added for
                      reciprocals: we use a function-level attribute for
                      those (grep for "reciprocal-estimates"). IIRC,
                      that was just a quicker fix than using MD_fpmath.
                      The existing squished boolean FMF can't convey the
                      more general settings that we need for reciprocal
                      optimizations.<br>
                    </div>
                    <div>2. These don't require new bits, but FMF isn't
                      applied correctly today as-is:</div>
                    <div><a moz-do-not-send="true"
                        href="https://reviews.llvm.org/D48085"
                        target="_blank">https://reviews.llvm.org/D48085</a></div>
                    <div><a moz-do-not-send="true"
                        href="https://bugs.llvm.org/show_bug.cgi?id=38086"
                        target="_blank">https://bugs.llvm.org/show_bug.cgi?id=38086</a><br>
                    </div>
                    <div> <a moz-do-not-send="true"
                        href="https://bugs.llvm.org/show_bug.cgi?id=39535"
                        target="_blank">https://bugs.llvm.org/show_bug.cgi?id=39535</a><br>
                    </div>
                    <div> <a moz-do-not-send="true"
                        href="https://reviews.llvm.org/D51701"
                        target="_blank">https://reviews.llvm.org/D51701</a></div>
                    <div>...so we need to make FMF changes regardless of
                      FTZ.<br>
                    </div>
                  </div>
                </div>
              </div>
            </div>
          </div>
        </div>
      </div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">On Sun, Mar 17, 2019 at 2:47
          PM Craig Topper via llvm-dev <<a moz-do-not-send="true"
            href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>>
          wrote:<br>
        </div>
        <blockquote class="gmail_quote">
          <div dir="ltr">
            <div dir="ltr">
              <div dir="ltr">Can we move HasValueHandle out of the byte
                used for SubClassOptionalData and move it to the flags
                at the bottom of value by shrinking NumUserOperands to
                27?</div>
              <div dir="ltr"><br>
                <div>
                  <div dir="ltr"
class="gmail-m_-2453401361646942099gmail-m_8394751908904538983gmail-m_3444784440108062877gmail-m_-6021903904812589050gmail-m_2578779147886616355gmail-m_246014101045909661gmail-m_-8731048574723563634gmail-m_2537030772873414874gmail-m_6741912120350380528gmail_signature">~Craig</div>
                </div>
                <br>
              </div>
            </div>
          </div>
          <br>
          <div class="gmail_quote">
            <div dir="ltr" class="gmail_attr">On Sat, Mar 16, 2019 at
              12:51 PM Sanjoy Das via llvm-dev <<a
                moz-do-not-send="true"
                href="mailto:llvm-dev@lists.llvm.org" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a></a>>
              wrote:<br>
            </div>
            <blockquote class="gmail_quote">Hi,<br>
              <br>
              I need to add a flush-denormals-to-zero (FTZ) flag to
              FastMathFlags,<br>
              but  we've already used up the 7 bits available in<br>
              Value::SubclassOptionalData (the "backing storage" for<br>
              FPMathOperator::getFastMathFlags()).  These are the
              possibilities I<br>
              can think of:<br>
              <br>
              1. Increase the size of FPMathOperator.  This gives us
              some additional<br>
              bits for FTZ and other fastmath flags we'd want to add in
              the future.<br>
              Obvious downside is that it increases LLVM's memory
              footprint.<br>
              <br>
              2. Steal some low bits from pointers already present in
              Value and<br>
              expose them as part of SubclassOptionalData.  We can at
              least steal 3<br>
              bits from the first two words in Value which are both
              pointers.  The<br>
              LSB of the first pointer needs to be 0, otherwise we could
              steal 4<br>
              bits.<br>
              <br>
              3. Allow only specific combinations in FastMathFlags.  In
              practice, I<br>
              don't think folks are equally interested in all the 2^N
              combinations<br>
              present in FastMathFlags, so we could compromise and allow
              only the<br>
              most "typical" 2^7 combinations (e.g. we could nonan and
              noinf into a<br>
              single bit, under the assumption that users want to
              enable-disable<br>
              them as a unit).  I'm unsure if establishing the most
              typical 2^7<br>
              combinations will be straightforward though.<br>
              <br>
              4. Function level attributes.  Instead of wasting precious<br>
              instruction-level space, we could move all FP math
              attributes on the<br>
              containing function.  I'm not sure if this will work for
              all frontends<br>
              and it also raises annoying tradeoffs around inlining and
              other<br>
              inter-procedural passes.<br>
              <br>
              <br>
              My gut feeling is to go with (2).  It should be
              semantically<br>
              invisible, have no impact on memory usage, and the ugly
              bit<br>
              manipulation can be abstracted away.  What do you think? 
              Any other<br>
              possibilities I missed?<br>
              <br>
              <br>
              Why I need an FTZ flag:  some ARM Neon vector instructions
              have FTZ<br>
              semantics, which means we can't vectorize instructions
              when compiling<br>
              for Neon unless we know the user is okay with FTZ.  Today
              we pretend<br>
              that the "fast" variant of FastMathFlags implies FTZ<br>
              (<a moz-do-not-send="true"
                href="https://reviews.llvm.org/rL266363"
                rel="noreferrer" target="_blank">https://reviews.llvm.org/rL266363</a>),
              which is not ideal.  Moreover<br>
              (this is the immediate reason), for XLA CPU I'm trying to
              generate FP<br>
              instructions without nonan and noinf, which breaks
              vectorization on<br>
              ARM Neon for this reason.  An explicit bit for FTZ will
              let me<br>
              generate FP operations tagged with FTZ and all fast math
              flags except<br>
              nonan and noinf, and still have them vectorize on Neon.<br>
              <br>
              -- Sanjoy<br>
              _______________________________________________<br>
              LLVM Developers mailing list<br>
              <a moz-do-not-send="true"
                href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
              <a moz-do-not-send="true"
                href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"
                rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
            </blockquote>
          </div>
          _______________________________________________<br>
          LLVM Developers mailing list<br>
          <a moz-do-not-send="true"
            href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
          <a moz-do-not-send="true"
            href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"
            rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
        </blockquote>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
LLVM Developers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>
<a class="moz-txt-link-freetext" href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>