<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <font size="-1">As for me, I lean for Sanjay's proposal and Sanjoy's

      #4, as both seem to me to be more future proof and enable

      mimicking the behavior of GCC more accurately.<br>

      <br>

      On another note, do y'all have any thoughts about changing the FP

      math semantics to FTZ and DAZ for the whole program, as some, if

      not all, current targets support such FP modes through bits in

      their FP unit control register, or similar?<br>

      <br>

      As Hal once pointed out to me, the way that GCC works is a bit

      unnerving, as any DSO that changes the FP mode to use such

      semantics affects all modules, even those which were written

      without this change in mind.  Perhaps emitting the initialization

      code to change the FP mode for DSOs might be suppressed, thus

      leaving this run time change in the hands of the program

      developer, not the library developer's.  Although this raises some

      questions as well.<br>

      <br>

      GCC accomplishes this in libgcc, whereas, should the same behavior

      be copied by LLVM, it would likely reside in compiler-rt.<br>

      <br>

      Cheers,<br>

    </font>

    <pre class="moz-signature" cols="72">-- 

Evandro Menezes

</pre>

    <div class="moz-cite-prefix">On 03/18/19 11:31, Sanjay Patel via

      llvm-dev wrote:<br>

    </div>

    <blockquote

cite="mid:CA+wODisqqS=Jo+Qg4dyWTu-xeeGPuvQk3DbOmQjVYso_8EJeXA@mail.gmail.com"

      type="cite">

      <meta http-equiv="Context-Type" content="text/html; charset=UTF-8">

      <div dir="ltr">

        <div dir="ltr">

          <div dir="ltr">

            <div dir="ltr">

              <div dir="ltr">

                <div dir="ltr">

                  <div dir="ltr">

                    <div>We knew the day when we needed another FMF bit

                      was coming back in:</div>

                    <div><a moz-do-not-send="true"

                        href="https://reviews.llvm.org/D39304"

                        rel="noreferrer" target="_blank">https://reviews.llvm.org/D39304</a></div>

                    <div>...it was just a question of 'when'. :)</div>

                    <div><br>

                    </div>

                    <div>I'm guessing that an FTZ bit won't be the last

                      new bit needed if we consider permutations between

                      strict FP and fast-math. Even without that,

                      denormals-as-zero (DAZ) might also be useful?<br>

                    </div>

                    <div>So rather than continuing to carve these out

                      bit-by-bit, it's worth considering a more general

                      solution: instruction-level metadata.</div>

                    <div><br>

                    </div>

                    <div>IIUC, the main argument for making FMF part of

                      the instruction was that per-instruction metadata

                      gets expensive if we're applying it to a

                      significant chunk of the instructions.</div>

                    <div>But let's think about that - even the most

                      FP-heavy code tops out around 10% FP math ops out

                      of the total instruction count. Typical FP

                      benchmark code is only 2-5% FP ops. The rest is

                      the same load/store/control-flow/ALU stuff found

                      in integer code.</div>

                    <div><br>

                    </div>

                    <div>I'm not exactly sure yet what it would take to

                      do the experiment, but it seems worth exploring

                      moving the existing FMF to metadata.<br>

                    </div>

                    <div><br>

                    </div>

                    <div>One point in favor of this approach is that we

                      already have an "MD_fpmath" enum. It's currently

                      only used to convey reduced precision requirements

                      to the AMDGPU backend. We could extend that to

                      include arbitrary FMF settings. <br>

                    </div>

                    <div><br>

                    </div>

                    <div>A couple of related points for FMF-as-metadata:</div>

                    <div>1. It might encourage fixing a hack added for

                      reciprocals: we use a function-level attribute for

                      those (grep for "reciprocal-estimates"). IIRC,

                      that was just a quicker fix than using MD_fpmath.

                      The existing squished boolean FMF can't convey the

                      more general settings that we need for reciprocal

                      optimizations.<br>

                    </div>

                    <div>2. These don't require new bits, but FMF isn't

                      applied correctly today as-is:</div>

                    <div><a moz-do-not-send="true"

                        href="https://reviews.llvm.org/D48085"

                        target="_blank">https://reviews.llvm.org/D48085</a></div>

                    <div><a moz-do-not-send="true"

                        href="https://bugs.llvm.org/show_bug.cgi?id=38086"

                        target="_blank">https://bugs.llvm.org/show_bug.cgi?id=38086</a><br>

                    </div>

                    <div> <a moz-do-not-send="true"

                        href="https://bugs.llvm.org/show_bug.cgi?id=39535"

                        target="_blank">https://bugs.llvm.org/show_bug.cgi?id=39535</a><br>

                    </div>

                    <div> <a moz-do-not-send="true"

                        href="https://reviews.llvm.org/D51701"

                        target="_blank">https://reviews.llvm.org/D51701</a></div>

                    <div>...so we need to make FMF changes regardless of

                      FTZ.<br>

                    </div>

                  </div>

                </div>

              </div>

            </div>

          </div>

        </div>

      </div>

      <br>

      <div class="gmail_quote">

        <div dir="ltr" class="gmail_attr">On Sun, Mar 17, 2019 at 2:47

          PM Craig Topper via llvm-dev <<a moz-do-not-send="true"

            href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>>

          wrote:<br>

        </div>

        <blockquote class="gmail_quote">

          <div dir="ltr">

            <div dir="ltr">

              <div dir="ltr">Can we move HasValueHandle out of the byte

                used for SubClassOptionalData and move it to the flags

                at the bottom of value by shrinking NumUserOperands to

                27?</div>

              <div dir="ltr"><br>

                <div>

                  <div dir="ltr"

class="gmail-m_-2453401361646942099gmail-m_8394751908904538983gmail-m_3444784440108062877gmail-m_-6021903904812589050gmail-m_2578779147886616355gmail-m_246014101045909661gmail-m_-8731048574723563634gmail-m_2537030772873414874gmail-m_6741912120350380528gmail_signature">~Craig</div>

                </div>

                <br>

              </div>

            </div>

          </div>

          <br>

          <div class="gmail_quote">

            <div dir="ltr" class="gmail_attr">On Sat, Mar 16, 2019 at

              12:51 PM Sanjoy Das via llvm-dev <<a

                moz-do-not-send="true"

                href="mailto:llvm-dev@lists.llvm.org" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a></a>>

              wrote:<br>

            </div>

            <blockquote class="gmail_quote">Hi,<br>

              <br>

              I need to add a flush-denormals-to-zero (FTZ) flag to

              FastMathFlags,<br>

              but  we've already used up the 7 bits available in<br>

              Value::SubclassOptionalData (the "backing storage" for<br>

              FPMathOperator::getFastMathFlags()).  These are the

              possibilities I<br>

              can think of:<br>

              <br>

              1. Increase the size of FPMathOperator.  This gives us

              some additional<br>

              bits for FTZ and other fastmath flags we'd want to add in

              the future.<br>

              Obvious downside is that it increases LLVM's memory

              footprint.<br>

              <br>

              2. Steal some low bits from pointers already present in

              Value and<br>

              expose them as part of SubclassOptionalData.  We can at

              least steal 3<br>

              bits from the first two words in Value which are both

              pointers.  The<br>

              LSB of the first pointer needs to be 0, otherwise we could

              steal 4<br>

              bits.<br>

              <br>

              3. Allow only specific combinations in FastMathFlags.  In

              practice, I<br>

              don't think folks are equally interested in all the 2^N

              combinations<br>

              present in FastMathFlags, so we could compromise and allow

              only the<br>

              most "typical" 2^7 combinations (e.g. we could nonan and

              noinf into a<br>

              single bit, under the assumption that users want to

              enable-disable<br>

              them as a unit).  I'm unsure if establishing the most

              typical 2^7<br>

              combinations will be straightforward though.<br>

              <br>

              4. Function level attributes.  Instead of wasting precious<br>

              instruction-level space, we could move all FP math

              attributes on the<br>

              containing function.  I'm not sure if this will work for

              all frontends<br>

              and it also raises annoying tradeoffs around inlining and

              other<br>

              inter-procedural passes.<br>

              <br>

              <br>

              My gut feeling is to go with (2).  It should be

              semantically<br>

              invisible, have no impact on memory usage, and the ugly

              bit<br>

              manipulation can be abstracted away.  What do you think? 

              Any other<br>

              possibilities I missed?<br>

              <br>

              <br>

              Why I need an FTZ flag:  some ARM Neon vector instructions

              have FTZ<br>

              semantics, which means we can't vectorize instructions

              when compiling<br>

              for Neon unless we know the user is okay with FTZ.  Today

              we pretend<br>

              that the "fast" variant of FastMathFlags implies FTZ<br>

              (<a moz-do-not-send="true"

                href="https://reviews.llvm.org/rL266363"

                rel="noreferrer" target="_blank">https://reviews.llvm.org/rL266363</a>),

              which is not ideal.  Moreover<br>

              (this is the immediate reason), for XLA CPU I'm trying to

              generate FP<br>

              instructions without nonan and noinf, which breaks

              vectorization on<br>

              ARM Neon for this reason.  An explicit bit for FTZ will

              let me<br>

              generate FP operations tagged with FTZ and all fast math

              flags except<br>

              nonan and noinf, and still have them vectorize on Neon.<br>

              <br>

              -- Sanjoy<br>

              _______________________________________________<br>

              LLVM Developers mailing list<br>

              <a moz-do-not-send="true"

                href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

              <a moz-do-not-send="true"

                href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"

                rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

            </blockquote>

          </div>

          _______________________________________________<br>

          LLVM Developers mailing list<br>

          <a moz-do-not-send="true"

            href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

          <a moz-do-not-send="true"

            href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"

            rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

        </blockquote>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

LLVM Developers mailing list

<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>

<a class="moz-txt-link-freetext" href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>