<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    Hi, Steve:<br>

    <br>

      Thanks a lot for the clarification and your expertise!<br>

      For the sake of simplicity, I'd like to implement these rules

    separately for safe-mode FP arithmetic. <br>

    <br>

    Best Regards<br>

    Shuxin <br>

    <br>

    <div class="moz-cite-prefix">On 12/11/12 2:00 PM, Stephen Canon

      wrote:<br>

    </div>

    <blockquote

      cite="mid:610C1A97-BE8F-45CB-B176-68C775CDEF0D@apple.com"

      type="cite">

      <meta http-equiv="Content-Type" content="text/html;

        charset=ISO-8859-1">

      <div>(x + x) + (x + x) --> 4 * x is always exact and is safe

        (this follows from simply applying the (x + x) --> 2*x rule

        twice, then using the fact that 2*2*x = 4*x).</div>

      <div><br>

      </div>

      <div>((x + x) + x) + x --> 4 * x is exact <b>assuming that the

          default rounding mode</b> <b>is in effect</b>.  I don't

        believe that we model FENV_ACCESS at present, but fast-math

        should certainly imply assume-default-rounding.  (((x + x) + x)

        + x) + x --> 5*x is also exact assuming default rounding.

         This property breaks down for adding x to itself six times

        (there's no deep theorem here, it just works out that way,

        sorry).</div>

      <br>

      <div>

        <div>On Dec 11, 2012, at 4:54 PM, Shuxin Yang <<a

            moz-do-not-send="true" href="mailto:shuxin.llvm@gmail.com">shuxin.llvm@gmail.com</a>>

          wrote:</div>

        <br class="Apple-interchange-newline">

        <blockquote type="cite">Hi, Steve:<br>

          <br>

           Thank you for your feedback.Is it also true for (x+x) + (x+x)

          => 4.0 * x?<br>

          <br>

            Forget to mention one thing. My coworkers told me that the

          CodeGen is smart enough to<br>

          expand C*X into right instruction sequence considering the

          cost of fmul and fadd on the underlying architectures.<br>

          <br>

           The X+....+X = N*X is just to make the representation easier

          for the optimizer.<br>

          <br>

          Thanks<br>

          Shuxin<br>

          <br>

          On 12/11/12 1:39 PM, Stephen Canon wrote:<br>

          <blockquote type="cite">(x + x) + x --> x*3 is always exact

            and does not require relaxed / fast-math.<br>

            <br>

            - Steve<br>

            <br>

            On Dec 11, 2012, at 4:32 PM, Shuxin Yang <<a

              moz-do-not-send="true" href="mailto:shuxin.llvm@gmail.com">shuxin.llvm@gmail.com</a>>

            wrote:<br>

            <br>

            <blockquote type="cite">Hi, Dear All:<br>

              <br>

               The attached patch is to implement following rules about

              floating-point add/sub in relaxed mode.<br>

              (The n-th rule is not yet implemented. I just realized it

              when I write this mail.<br>

              It is easy to implement this rule, but I don't like to go

              through stress test one more time).<br>

              <br>

              ----------------------------------------------------<br>

              1. (x + c1) + c2 ->  x + (c1 + c2)<br>

              2. (c * x) + x -> (c+1) * x<br>

              3. (x + x) + x -> x * 3<br>

              4. c * x + (x + x) -> (c + 2)*x<br>

              5. (x + x) + (x+x) -> 4*x<br>

              6. x - (x + y) -> 0 - y<br>

               ...<br>

               ...<br>

               ...<br>

              n. (factoring) C * X1 + C * X2 -> C(X1 + X2)<br>

              -------------------------------------------------------<br>

              <br>

               Up to three neighboring instructions are involved in the

              optimization. The number<br>

              of the combination is daunting!. So I have to resort a

              general way (instead of<br>

              pattern match) to tackle these optimizations.<br>

              <br>

               The idea is simple, just try to decompose instructions

              into uniformally represented<br>

              Addends. Take following instruction sequence as an

              example:<br>

              <br>

               t1 = 1.8 * x;<br>

               t2 = y - x;<br>

               t3 = t1 - t2;<br>

              <br>

              t3 has two addends A1=<1, t1> (denote value 1*t1),

              and A2=<-1, t2>. If we "zoom-in"<br>

              A1 and A2 one step, we will reveal more addends: A1 can be

              zoom-in-ed into another<br>

              addend A1_0 = <1.8, x>, and A2 can be zoom-in into

              <1,y> and <-1,x>.<br>

              <br>

              When these addends available, the optimize try to optimize

              following N-ary additions<br>

              using symbolic evaluation:<br>

                A1_0 + A2_0 + A2_1, or<br>

                A1 +  A2_0 + A2_1 or<br>

                A1_0 + A2<br>

              <br>

              This patch is stress-tested with SingleSrc and MultiSource

              by considering all fadd/fsub<br>

              are in relaxed mode.<br>

              <br>

              Thank you for code review!<br>

              <br>

              Shuxin<br>

              <br>

<fast_math.add_sub.v1.patch>_______________________________________________<br>

              llvm-commits mailing list<br>

              <a moz-do-not-send="true"

                href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a><br>

              <a class="moz-txt-link-freetext" href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>

            </blockquote>

          </blockquote>

          <br>

        </blockquote>

      </div>

      <br>

    </blockquote>

    <br>

  </body>

</html>