<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 09/04/2017 12:12 AM, Xinliang David
      Li wrote:<br>
    </div>
    <blockquote
cite="mid:CAAkRFZ+o+93NRd=8kGHAbA+VmTpERhWrBRQKAMp075gEh4846g@mail.gmail.com"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <div dir="ltr"><br>
        <div class="gmail_extra"><br>
          <div class="gmail_quote">On Sun, Sep 3, 2017 at 9:23 PM, Hal
            Finkel <span dir="ltr"><<a moz-do-not-send="true"
                href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000"><span class="">
                  <p><br>
                  </p>
                  <div class="m_-5573162088652840693moz-cite-prefix">On
                    09/03/2017 11:06 PM, Xinliang David Li wrote:<br>
                  </div>
                  <blockquote type="cite">
                    <div dir="ltr">I think we can think this in another
                      way.
                      <div><br>
                      </div>
                      <div>For modern CPU architectures which supports
                        store forwarding with store queues, it is
                        generally not "safe" to blindly do local
                        optimizations to widen the load/stores</div>
                    </div>
                  </blockquote>
                  <br>
                </span> Why not widen stores? Generally the problem with
                store forwarding is where the load is wider than the
                store (plus alignment issues).<span class=""><br>
                  <br>
                </span></div>
            </blockquote>
            <div><br>
            </div>
            <div>True, but probably with some caveats which are target
              dependent.  Store widening also requires additional bit
              operations (and possibly addition load), so the it is
              tricky to model the the overall benefit. </div>
            <div><br>
            </div>
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000"><span class="">
                  <blockquote type="cite">
                    <div dir="ltr">
                      <div> without sophisticated inter-procedural
                        analysis. Doing so will run the risk of greatly
                        reduce performance of some programs. Keep the
                        naturally aligned load/store using its natural
                        type is safer. </div>
                      <div><br>
                      </div>
                      <div>Does it make sense?</div>
                    </div>
                  </blockquote>
                  <br>
                </span> It makes sense. I could, however, say the same
                thing about inlining. We need to make inlining decisions
                locally, but they have global impact. Nevertheless, we
                need to make inlining decisions, and there's no
                practical way to make them in a truly non-local way.<br>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>Speaking of inlining, we are actually thinking of ways
              to make the decisions more globally optimal, but that is
              off topic.</div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    Neat.<br>
    <br>
    <blockquote
cite="mid:CAAkRFZ+o+93NRd=8kGHAbA+VmTpERhWrBRQKAMp075gEh4846g@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000"> <br>
                We also don't pessimize common cases to improve
                performance in rare cases. In the common case, reducing
                pressure on the memory units, and reducing the critical
                path, seem likely to me to be optimal. If that's not
                true, or doing otherwise has negligible cost (but can
                prevent rare downsides), we should certainly consider
                those options.</div>
            </blockquote>
            <div><br>
            </div>
            <div>Since we don't do load widening for non-bitfield cases
              (but the only the very limited case of naturally aligned
              bitfields) so it is hard to say we pessimize common cases
              for rare cases:</div>
            <div><br>
            </div>
            <div>1) the upside doing widening such access is not high
              from experience with other compiler (which does not do so)</div>
            <div>2) there is obvious downside of introducing additional
              extract instructions which degrades performance</div>
            <div>3) there is obvious downside of severely degrading
              performance when store forwarding is blocked.</div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    I suspect that it's relatively rare to hit these store-to-load
    forwarding issues compared to the number of times the systems stores
    or loads to bitfields. In any case, I did some experiments on my
    Haswell system and found that the load from Wei's benchmark which is
    split into two loads, compared to the single load version, is 0.012%
    slower. I, indeed, won't worry about that too much. On my P8, I
    couldn't measure a difference. Obviously, this does somewhat miss
    the point, as the real cost in this kind of thing comes in stressing
    the memory units in code with a lot going on, not in isolated cases.<br>
    <br>
    Nevertheless, I think that you've convinced me that this is a
    least-bad solution. I'll want a flag preserving the old behavior.
    Something like -fwiden-bitfield-accesses (modulo bikeshedding).<br>
    <br>
    <blockquote
cite="mid:CAAkRFZ+o+93NRd=8kGHAbA+VmTpERhWrBRQKAMp075gEh4846g@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <div><br>
            </div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000"> <br>
                And none of this answers the question of whether it's
                better to have the store wider or the load split and
                narrow.<br>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div><br>
            </div>
            <div>It seems safer to do store widening more aggressively
              to avoid store forwarding stall issue, but doing this
              aggressively may also mean other runtime overhead
              introduced (extra load, data merge etc).</div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    Yes. Wei confirmed that this is slower.<br>
    <br>
    Thanks again,<br>
    Hal<br>
    <br>
    <blockquote
cite="mid:CAAkRFZ+o+93NRd=8kGHAbA+VmTpERhWrBRQKAMp075gEh4846g@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <div>Thanks,</div>
            <div><br>
            </div>
            <div>David</div>
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000"> <br>
                Thanks again,<br>
                Hal
                <div>
                  <div class="h5"><br>
                    <br>
                    <blockquote type="cite">
                      <div dir="ltr">
                        <div><br>
                        </div>
                        <div>David</div>
                        <div><br>
                        </div>
                        <div><br>
                        </div>
                      </div>
                      <div class="gmail_extra"><br>
                        <div class="gmail_quote">On Sun, Sep 3, 2017 at
                          8:55 PM, Hal Finkel <span dir="ltr"><<a
                              moz-do-not-send="true"
                              href="mailto:hfinkel@anl.gov"
                              target="_blank">hfinkel@anl.gov</a>></span>
                          wrote:<br>
                          <blockquote class="gmail_quote"
                            style="margin:0 0 0 .8ex;border-left:1px
                            #ccc solid;padding-left:1ex">
                            <div bgcolor="#FFFFFF" text="#000000"><span>
                                <p><br>
                                </p>
                                <div
                                  class="m_-5573162088652840693m_4449805550791560709moz-cite-prefix">On
                                  09/03/2017 10:38 PM, Xinliang David Li
                                  wrote:<br>
                                </div>
                                <blockquote type="cite">
                                  <div dir="ltr">Store forwarding stall
                                    cost is usually much higher compared
                                    with a load hitting L1 cache. For
                                    instance, on Haswell,  the latter is
                                    ~4 cycles, while the store
                                    forwarding stalls cost about 10
                                    cycles more than a successful store
                                    forwarding, which is roughly 15
                                    cycles. In some scenarios, the store
                                    forwarding stalls can be as high as
                                    50 cycles. See Agner's
                                    documentation.  <br>
                                  </div>
                                </blockquote>
                                <br>
                              </span> I understand. As I understand it,
                              there are two potential ways to fix this
                              problem:<br>
                              <br>
                               1. You can make the store wider (to match
                              the size of the wide load, thus permitting
                              forwarding).<br>
                               2. You can make the load smaller (to
                              match the size of the small store, thus
                              permitting forwarding).<br>
                              <br>
                              At least in this benchmark, which is a
                              better solution?<br>
                              <br>
                              Thanks again,<br>
                              Hal
                              <div>
                                <div class="m_-5573162088652840693h5"><br>
                                  <br>
                                  <blockquote type="cite">
                                    <div dir="ltr">
                                      <div><br>
                                      </div>
                                      <div>In other words, the optimizer
                                        needs to be taught to avoid
                                        defeating  the HW pipeline
                                        feature as much as possible.</div>
                                      <div><br>
                                      </div>
                                      <div>David</div>
                                    </div>
                                    <div class="gmail_extra"><br>
                                      <div class="gmail_quote">On Sun,
                                        Sep 3, 2017 at 6:32 PM, Hal
                                        Finkel <span dir="ltr"><<a
                                            moz-do-not-send="true"
                                            href="mailto:hfinkel@anl.gov"
                                            target="_blank">hfinkel@anl.gov</a>></span>
                                        wrote:<br>
                                        <blockquote class="gmail_quote"
                                          style="margin:0 0 0
                                          .8ex;border-left:1px #ccc
                                          solid;padding-left:1ex">
                                          <div
                                            class="m_-5573162088652840693m_4449805550791560709HOEnZb">
                                            <div
                                              class="m_-5573162088652840693m_4449805550791560709h5"><br>
                                              On 09/03/2017 03:44 PM,
                                              Wei Mi wrote:<br>
                                              <blockquote
                                                class="gmail_quote"
                                                style="margin:0 0 0
                                                .8ex;border-left:1px
                                                #ccc
                                                solid;padding-left:1ex">
                                                On Sat, Sep 2, 2017 at
                                                6:04 PM, Hal Finkel <<a
                                                  moz-do-not-send="true"
href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>>
                                                wrote:<br>
                                                <blockquote
                                                  class="gmail_quote"
                                                  style="margin:0 0 0
                                                  .8ex;border-left:1px
                                                  #ccc
                                                  solid;padding-left:1ex">
                                                  On 08/22/2017 10:56
                                                  PM, Wei Mi via
                                                  llvm-commits wrote:<br>
                                                  <blockquote
                                                    class="gmail_quote"
                                                    style="margin:0 0 0
                                                    .8ex;border-left:1px
                                                    #ccc
                                                    solid;padding-left:1ex">
                                                    On Tue, Aug 22, 2017
                                                    at 7:03 PM, Xinliang
                                                    David Li <<a
                                                      moz-do-not-send="true"
href="mailto:davidxl@google.com" target="_blank">davidxl@google.com</a>><br>
                                                    wrote:<br>
                                                    <blockquote
                                                      class="gmail_quote"
                                                      style="margin:0 0
                                                      0
                                                      .8ex;border-left:1px
                                                      #ccc
                                                      solid;padding-left:1ex">
                                                      <br>
                                                      On Tue, Aug 22,
                                                      2017 at 6:37 PM,
                                                      Chandler Carruth
                                                      via Phabricator<br>
                                                      <<a
                                                        moz-do-not-send="true"
href="mailto:reviews@reviews.llvm.org" target="_blank">reviews@reviews.llvm.org</a>>
                                                      wrote:<br>
                                                      <blockquote
                                                        class="gmail_quote"
                                                        style="margin:0
                                                        0 0
                                                        .8ex;border-left:1px
                                                        #ccc
                                                        solid;padding-left:1ex">
                                                        chandlerc added
                                                        a comment.<br>
                                                        <br>
                                                        I'm really not a
                                                        fan of the
                                                        degree of
                                                        complexity and
                                                        subtlety that
                                                        this<br>
                                                        introduces into
                                                        the frontend,
                                                        all to allow
                                                        particular
                                                        backend<br>
                                                        optimizations.<br>
                                                        <br>
                                                        I feel like this
                                                        is Clang working
                                                        around a
                                                        fundamental
                                                        deficiency in<br>
                                                        LLVM<br>
                                                        and we should
                                                        instead find a
                                                        way to fix this
                                                        in LLVM itself.<br>
                                                        <br>
                                                        As has been
                                                        pointed out
                                                        before, user
                                                        code can
                                                        synthesize large
                                                        integers<br>
                                                        that small bit
                                                        sequences are
                                                        extracted from,
                                                        and Clang and
                                                        LLVM should<br>
                                                        handle those
                                                        just as well as
                                                        actual
                                                        bitfields.<br>
                                                        <br>
                                                        Can we see how
                                                        far we can push
                                                        the LLVM side
                                                        before we add
                                                        complexity to<br>
                                                        Clang here? I
                                                        understand that
                                                        there remain
                                                        challenges to
                                                        LLVM's stuff,<br>
                                                        but I<br>
                                                        don't think
                                                        those challenges
                                                        make *all* of
                                                        the LLVM
                                                        improvements off
                                                        the<br>
                                                        table, I don't
                                                        think we've
                                                        exhausted all
                                                        ways of
                                                        improving the
                                                        LLVM<br>
                                                        changes<br>
                                                        being proposed,
                                                        and I think we
                                                        should still
                                                        land all of
                                                        those and<br>
                                                        re-evaluate how
                                                        important these
                                                        issues are when
                                                        all of that is
                                                        in place.<br>
                                                      </blockquote>
                                                      <br>
                                                      The main challenge
                                                      of doing  this in
                                                      LLVM is that
                                                      inter-procedural<br>
                                                      analysis<br>
                                                      (and possibly
                                                      cross module) is
                                                      needed (for store
                                                      forwarding
                                                      issues).<br>
                                                      <br>
                                                      Wei, perhaps you
                                                      can provide
                                                      concrete test case
                                                      to illustrate the
                                                      issue<br>
                                                      so<br>
                                                      that reviewers
                                                      have a good
                                                      understanding.<br>
                                                      <br>
                                                      David<br>
                                                    </blockquote>
                                                    Here is a runable
                                                    testcase:<br>
                                                    --------------------
                                                    1.cc
                                                    ------------------------<br>
                                                    class A {<br>
                                                    public:<br>
                                                        unsigned long
                                                    f1:2;<br>
                                                        unsigned long
                                                    f2:6;<br>
                                                        unsigned long
                                                    f3:8;<br>
                                                        unsigned long
                                                    f4:4;<br>
                                                    };<br>
                                                    A a;<br>
                                                    unsigned long b;<br>
                                                    unsigned long N =
                                                    1000000000;<br>
                                                    <br>
__attribute__((noinline))<br>
                                                    void foo() {<br>
                                                        a.f3 = 3;<br>
                                                    }<br>
                                                    <br>
__attribute__((noinline))<br>
                                                    void goo() {<br>
                                                        b = a.f3;<br>
                                                    }<br>
                                                    <br>
                                                    int main() {<br>
                                                        unsigned long i;<br>
                                                        for (i = 0; i
                                                    < N; i++) {<br>
                                                          foo();<br>
                                                          goo();<br>
                                                        }<br>
                                                    }<br>
------------------------------<wbr>------------------------------<br>
                                                    Now trunk takes
                                                    about twice running
                                                    time compared with
                                                    trunk + this<br>
                                                    patch. That is
                                                    because trunk
                                                    shrinks the store of
                                                    a.f3 in foo (Done by<br>
                                                    DagCombiner) but not
                                                    shrink the load of
                                                    a.f3 in goo, so
                                                    store<br>
                                                    forwarding will be
                                                    blocked.<br>
                                                  </blockquote>
                                                  <br>
                                                  I can confirm that
                                                  this is true on
                                                  Haswell and also on an
                                                  POWER8.<br>
                                                  Interestingly, on a
                                                  POWER7, the
                                                  performance is the
                                                  same either way (on
                                                  the<br>
                                                  good side). I ran the
                                                  test case as presented
                                                  and where I replaced
                                                  f3 with a<br>
                                                  non-bitfield unsigned
                                                  char member. Thinking
                                                  that the POWER7 result
                                                  might be<br>
                                                  because it's
                                                  big-Endian, I flipped
                                                  the order of the
                                                  fields, and found that<br>
                                                  the version where f3
                                                  is not a bitfield is
                                                  faster than otherwise,
                                                  but only by<br>
                                                  12.5%.<br>
                                                  <br>
                                                  Why, in this case,
                                                  don't we shrink the
                                                  load? It seems like we
                                                  should (and it<br>
                                                  seems like a
                                                  straightforward case).<br>
                                                  <br>
                                                  Thanks again,<br>
                                                  Hal<br>
                                                  <br>
                                                </blockquote>
                                                Hal, thanks for trying
                                                the test.<br>
                                                <br>
                                                Yes, it is
                                                straightforward to
                                                shrink the load in the
                                                test. I can<br>
                                                change the testcase a
                                                little to show why it is
                                                sometime difficult to<br>
                                                shrink the load:<br>
                                                <br>
                                                class A {<br>
                                                public:<br>
                                                   unsigned long f1:16;<br>
                                                   unsigned long f2:16;<br>
                                                   unsigned long f3:16;<br>
                                                   unsigned long f4:8;<br>
                                                };<br>
                                                A a;<br>
                                                bool b;<br>
                                                unsigned long N =
                                                1000000000;<br>
                                                <br>
__attribute__((noinline))<br>
                                                void foo() {<br>
                                                   a.f4 = 3;<br>
                                                }<br>
                                                <br>
__attribute__((noinline))<br>
                                                void goo() {<br>
                                                   b = (a.f4 == 0
                                                && a.f3 ==
                                                (unsigned short)-1);<br>
                                                }<br>
                                                <br>
                                                int main() {<br>
                                                   unsigned long i;<br>
                                                   for (i = 0; i < N;
                                                i++) {<br>
                                                     foo();<br>
                                                     goo();<br>
                                                   }<br>
                                                }<br>
                                                <br>
                                                For the load a.f4 in
                                                goo, it is diffcult to
                                                motivate its shrink
                                                after<br>
                                                instcombine because the
                                                comparison with a.f3 and
                                                the comparison with<br>
                                                a.f4 are merged:<br>
                                                <br>
                                                define void @_Z3goov()
                                                local_unnamed_addr #0 {<br>
                                                   %1 = load i64, i64*
                                                bitcast (%class.A* @a to
                                                i64*), align 8<br>
                                                   %2 = and i64 %1,
                                                0xffffff00000000<br>
                                                   %3 = icmp eq i64 %2,
                                                0xffff00000000<br>
                                                   %4 = zext i1 %3 to i8<br>
                                                   store i8 %4, i8* @b,
                                                align 1, !tbaa !2<br>
                                                   ret void<br>
                                                }<br>
                                              </blockquote>
                                              <br>
                                            </div>
                                          </div>
                                          Exactly. But this behavior is
                                          desirable, locally. There's no
                                          good answer here: We either
                                          generate extra load traffic
                                          here (because we need to load
                                          the fields separately), or we
                                          widen the store (which
                                          generates extra load traffic
                                          there). Do you know, in terms
                                          of performance, which is
                                          better in this case (i.e., is
                                          it better to widen the store
                                          or split the load)?<span
                                            class="m_-5573162088652840693m_4449805550791560709HOEnZb"><font
                                              color="#888888"><br>
                                              <br>
                                               -Hal</font></span>
                                          <div
                                            class="m_-5573162088652840693m_4449805550791560709HOEnZb">
                                            <div
                                              class="m_-5573162088652840693m_4449805550791560709h5"><br>
                                              <br>
                                              <blockquote
                                                class="gmail_quote"
                                                style="margin:0 0 0
                                                .8ex;border-left:1px
                                                #ccc
                                                solid;padding-left:1ex">
                                                <br>
                                                Thanks,<br>
                                                Wei.<br>
                                                <br>
                                                <blockquote
                                                  class="gmail_quote"
                                                  style="margin:0 0 0
                                                  .8ex;border-left:1px
                                                  #ccc
                                                  solid;padding-left:1ex">
                                                  <blockquote
                                                    class="gmail_quote"
                                                    style="margin:0 0 0
                                                    .8ex;border-left:1px
                                                    #ccc
                                                    solid;padding-left:1ex">
                                                    The testcases shows
                                                    the potential
                                                    problem of store
                                                    shrinking. Before<br>
                                                    we decide to do
                                                    store shrinking, we
                                                    need to know all the
                                                    related loads<br>
                                                    will be shrunk,  and
                                                    that requires IPA
                                                    analysis. Otherwise,
                                                    when load<br>
                                                    shrinking was
                                                    blocked for some
                                                    difficult case (Like
                                                    the instcombine<br>
                                                    case described in<br>
                                                    <a
                                                      moz-do-not-send="true"
href="https://www.mail-archive.com/cfe-commits@lists.llvm.org/msg65085.html"
                                                      rel="noreferrer"
                                                      target="_blank">https://www.mail-archive.com/c<wbr>fe-commits@lists.llvm.org/msg6<wbr>5085.html</a>),<br>
                                                    performance
                                                    regression will
                                                    happen.<br>
                                                    <br>
                                                    Wei.<br>
                                                    <br>
                                                    <br>
                                                    <blockquote
                                                      class="gmail_quote"
                                                      style="margin:0 0
                                                      0
                                                      .8ex;border-left:1px
                                                      #ccc
                                                      solid;padding-left:1ex">
                                                      <blockquote
                                                        class="gmail_quote"
                                                        style="margin:0
                                                        0 0
                                                        .8ex;border-left:1px
                                                        #ccc
                                                        solid;padding-left:1ex">
                                                        <br>
                                                        Repository:<br>
                                                            rL LLVM<br>
                                                        <br>
                                                        <a
                                                          moz-do-not-send="true"
href="https://reviews.llvm.org/D36562" rel="noreferrer" target="_blank">https://reviews.llvm.org/D3656<wbr>2</a><br>
                                                        <br>
                                                        <br>
                                                        <br>
                                                      </blockquote>
                                                    </blockquote>
______________________________<wbr>_________________<br>
                                                    llvm-commits mailing
                                                    list<br>
                                                    <a
                                                      moz-do-not-send="true"
href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a><br>
                                                    <a
                                                      moz-do-not-send="true"
href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits"
                                                      rel="noreferrer"
                                                      target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-commits</a><br>
                                                  </blockquote>
                                                  <br>
                                                  --<br>
                                                  Hal Finkel<br>
                                                  Lead, Compiler
                                                  Technology and
                                                  Programming Languages<br>
                                                  Leadership Computing
                                                  Facility<br>
                                                  Argonne National
                                                  Laboratory<br>
                                                  <br>
                                                </blockquote>
                                              </blockquote>
                                              <br>
                                              -- <br>
                                              Hal Finkel<br>
                                              Lead, Compiler Technology
                                              and Programming Languages<br>
                                              Leadership Computing
                                              Facility<br>
                                              Argonne National
                                              Laboratory<br>
                                              <br>
                                            </div>
                                          </div>
                                        </blockquote>
                                      </div>
                                      <br>
                                    </div>
                                  </blockquote>
                                  <br>
                                  <pre class="m_-5573162088652840693m_4449805550791560709moz-signature" cols="72">-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
                                </div>
                              </div>
                            </div>
                          </blockquote>
                        </div>
                        <br>
                      </div>
                    </blockquote>
                    <br>
                    <pre class="m_-5573162088652840693moz-signature" cols="72">-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
                  </div>
                </div>
              </div>
            </blockquote>
          </div>
          <br>
        </div>
      </div>
    </blockquote>
    <br>
    <pre class="moz-signature" cols="72">-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
  </body>
</html>