<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <p><br>

    </p>

    <div class="moz-cite-prefix">On 09/04/2017 12:12 AM, Xinliang David

      Li wrote:<br>

    </div>

    <blockquote

cite="mid:CAAkRFZ+o+93NRd=8kGHAbA+VmTpERhWrBRQKAMp075gEh4846g@mail.gmail.com"

      type="cite">

      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

      <div dir="ltr"><br>

        <div class="gmail_extra"><br>

          <div class="gmail_quote">On Sun, Sep 3, 2017 at 9:23 PM, Hal

            Finkel <span dir="ltr"><<a moz-do-not-send="true"

                href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span>

            wrote:<br>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000"><span class="">

                  <p><br>

                  </p>

                  <div class="m_-5573162088652840693moz-cite-prefix">On

                    09/03/2017 11:06 PM, Xinliang David Li wrote:<br>

                  </div>

                  <blockquote type="cite">

                    <div dir="ltr">I think we can think this in another

                      way.

                      <div><br>

                      </div>

                      <div>For modern CPU architectures which supports

                        store forwarding with store queues, it is

                        generally not "safe" to blindly do local

                        optimizations to widen the load/stores</div>

                    </div>

                  </blockquote>

                  <br>

                </span> Why not widen stores? Generally the problem with

                store forwarding is where the load is wider than the

                store (plus alignment issues).<span class=""><br>

                  <br>

                </span></div>

            </blockquote>

            <div><br>

            </div>

            <div>True, but probably with some caveats which are target

              dependent.  Store widening also requires additional bit

              operations (and possibly addition load), so the it is

              tricky to model the the overall benefit. </div>

            <div><br>

            </div>

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000"><span class="">

                  <blockquote type="cite">

                    <div dir="ltr">

                      <div> without sophisticated inter-procedural

                        analysis. Doing so will run the risk of greatly

                        reduce performance of some programs. Keep the

                        naturally aligned load/store using its natural

                        type is safer. </div>

                      <div><br>

                      </div>

                      <div>Does it make sense?</div>

                    </div>

                  </blockquote>

                  <br>

                </span> It makes sense. I could, however, say the same

                thing about inlining. We need to make inlining decisions

                locally, but they have global impact. Nevertheless, we

                need to make inlining decisions, and there's no

                practical way to make them in a truly non-local way.<br>

              </div>

            </blockquote>

            <div><br>

            </div>

            <div>Speaking of inlining, we are actually thinking of ways

              to make the decisions more globally optimal, but that is

              off topic.</div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    Neat.<br>

    <br>

    <blockquote

cite="mid:CAAkRFZ+o+93NRd=8kGHAbA+VmTpERhWrBRQKAMp075gEh4846g@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000"> <br>

                We also don't pessimize common cases to improve

                performance in rare cases. In the common case, reducing

                pressure on the memory units, and reducing the critical

                path, seem likely to me to be optimal. If that's not

                true, or doing otherwise has negligible cost (but can

                prevent rare downsides), we should certainly consider

                those options.</div>

            </blockquote>

            <div><br>

            </div>

            <div>Since we don't do load widening for non-bitfield cases

              (but the only the very limited case of naturally aligned

              bitfields) so it is hard to say we pessimize common cases

              for rare cases:</div>

            <div><br>

            </div>

            <div>1) the upside doing widening such access is not high

              from experience with other compiler (which does not do so)</div>

            <div>2) there is obvious downside of introducing additional

              extract instructions which degrades performance</div>

            <div>3) there is obvious downside of severely degrading

              performance when store forwarding is blocked.</div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    I suspect that it's relatively rare to hit these store-to-load

    forwarding issues compared to the number of times the systems stores

    or loads to bitfields. In any case, I did some experiments on my

    Haswell system and found that the load from Wei's benchmark which is

    split into two loads, compared to the single load version, is 0.012%

    slower. I, indeed, won't worry about that too much. On my P8, I

    couldn't measure a difference. Obviously, this does somewhat miss

    the point, as the real cost in this kind of thing comes in stressing

    the memory units in code with a lot going on, not in isolated cases.<br>

    <br>

    Nevertheless, I think that you've convinced me that this is a

    least-bad solution. I'll want a flag preserving the old behavior.

    Something like -fwiden-bitfield-accesses (modulo bikeshedding).<br>

    <br>

    <blockquote

cite="mid:CAAkRFZ+o+93NRd=8kGHAbA+VmTpERhWrBRQKAMp075gEh4846g@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div><br>

            </div>

            <div><br>

            </div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000"> <br>

                And none of this answers the question of whether it's

                better to have the store wider or the load split and

                narrow.<br>

              </div>

            </blockquote>

            <div><br>

            </div>

            <div><br>

            </div>

            <div>It seems safer to do store widening more aggressively

              to avoid store forwarding stall issue, but doing this

              aggressively may also mean other runtime overhead

              introduced (extra load, data merge etc).</div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    Yes. Wei confirmed that this is slower.<br>

    <br>

    Thanks again,<br>

    Hal<br>

    <br>

    <blockquote

cite="mid:CAAkRFZ+o+93NRd=8kGHAbA+VmTpERhWrBRQKAMp075gEh4846g@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div><br>

            </div>

            <div>Thanks,</div>

            <div><br>

            </div>

            <div>David</div>

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000"> <br>

                Thanks again,<br>

                Hal

                <div>

                  <div class="h5"><br>

                    <br>

                    <blockquote type="cite">

                      <div dir="ltr">

                        <div><br>

                        </div>

                        <div>David</div>

                        <div><br>

                        </div>

                        <div><br>

                        </div>

                      </div>

                      <div class="gmail_extra"><br>

                        <div class="gmail_quote">On Sun, Sep 3, 2017 at

                          8:55 PM, Hal Finkel <span dir="ltr"><<a

                              moz-do-not-send="true"

                              href="mailto:hfinkel@anl.gov"

                              target="_blank">hfinkel@anl.gov</a>></span>

                          wrote:<br>

                          <blockquote class="gmail_quote"

                            style="margin:0 0 0 .8ex;border-left:1px

                            #ccc solid;padding-left:1ex">

                            <div bgcolor="#FFFFFF" text="#000000"><span>

                                <p><br>

                                </p>

                                <div

                                  class="m_-5573162088652840693m_4449805550791560709moz-cite-prefix">On

                                  09/03/2017 10:38 PM, Xinliang David Li

                                  wrote:<br>

                                </div>

                                <blockquote type="cite">

                                  <div dir="ltr">Store forwarding stall

                                    cost is usually much higher compared

                                    with a load hitting L1 cache. For

                                    instance, on Haswell,  the latter is

                                    ~4 cycles, while the store

                                    forwarding stalls cost about 10

                                    cycles more than a successful store

                                    forwarding, which is roughly 15

                                    cycles. In some scenarios, the store

                                    forwarding stalls can be as high as

                                    50 cycles. See Agner's

                                    documentation.  <br>

                                  </div>

                                </blockquote>

                                <br>

                              </span> I understand. As I understand it,

                              there are two potential ways to fix this

                              problem:<br>

                              <br>

                               1. You can make the store wider (to match

                              the size of the wide load, thus permitting

                              forwarding).<br>

                               2. You can make the load smaller (to

                              match the size of the small store, thus

                              permitting forwarding).<br>

                              <br>

                              At least in this benchmark, which is a

                              better solution?<br>

                              <br>

                              Thanks again,<br>

                              Hal

                              <div>

                                <div class="m_-5573162088652840693h5"><br>

                                  <br>

                                  <blockquote type="cite">

                                    <div dir="ltr">

                                      <div><br>

                                      </div>

                                      <div>In other words, the optimizer

                                        needs to be taught to avoid

                                        defeating  the HW pipeline

                                        feature as much as possible.</div>

                                      <div><br>

                                      </div>

                                      <div>David</div>

                                    </div>

                                    <div class="gmail_extra"><br>

                                      <div class="gmail_quote">On Sun,

                                        Sep 3, 2017 at 6:32 PM, Hal

                                        Finkel <span dir="ltr"><<a

                                            moz-do-not-send="true"

                                            href="mailto:hfinkel@anl.gov"

                                            target="_blank">hfinkel@anl.gov</a>></span>

                                        wrote:<br>

                                        <blockquote class="gmail_quote"

                                          style="margin:0 0 0

                                          .8ex;border-left:1px #ccc

                                          solid;padding-left:1ex">

                                          <div

                                            class="m_-5573162088652840693m_4449805550791560709HOEnZb">

                                            <div

                                              class="m_-5573162088652840693m_4449805550791560709h5"><br>

                                              On 09/03/2017 03:44 PM,

                                              Wei Mi wrote:<br>

                                              <blockquote

                                                class="gmail_quote"

                                                style="margin:0 0 0

                                                .8ex;border-left:1px

                                                #ccc

                                                solid;padding-left:1ex">

                                                On Sat, Sep 2, 2017 at

                                                6:04 PM, Hal Finkel <<a

                                                  moz-do-not-send="true"

href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>>

                                                wrote:<br>

                                                <blockquote

                                                  class="gmail_quote"

                                                  style="margin:0 0 0

                                                  .8ex;border-left:1px

                                                  #ccc

                                                  solid;padding-left:1ex">

                                                  On 08/22/2017 10:56

                                                  PM, Wei Mi via

                                                  llvm-commits wrote:<br>

                                                  <blockquote

                                                    class="gmail_quote"

                                                    style="margin:0 0 0

                                                    .8ex;border-left:1px

                                                    #ccc

                                                    solid;padding-left:1ex">

                                                    On Tue, Aug 22, 2017

                                                    at 7:03 PM, Xinliang

                                                    David Li <<a

                                                      moz-do-not-send="true"

href="mailto:davidxl@google.com" target="_blank">davidxl@google.com</a>><br>

                                                    wrote:<br>

                                                    <blockquote

                                                      class="gmail_quote"

                                                      style="margin:0 0

                                                      0

                                                      .8ex;border-left:1px

                                                      #ccc

                                                      solid;padding-left:1ex">

                                                      <br>

                                                      On Tue, Aug 22,

                                                      2017 at 6:37 PM,

                                                      Chandler Carruth

                                                      via Phabricator<br>

                                                      <<a

                                                        moz-do-not-send="true"

href="mailto:reviews@reviews.llvm.org" target="_blank">reviews@reviews.llvm.org</a>>

                                                      wrote:<br>

                                                      <blockquote

                                                        class="gmail_quote"

                                                        style="margin:0

                                                        0 0

                                                        .8ex;border-left:1px

                                                        #ccc

                                                        solid;padding-left:1ex">

                                                        chandlerc added

                                                        a comment.<br>

                                                        <br>

                                                        I'm really not a

                                                        fan of the

                                                        degree of

                                                        complexity and

                                                        subtlety that

                                                        this<br>

                                                        introduces into

                                                        the frontend,

                                                        all to allow

                                                        particular

                                                        backend<br>

                                                        optimizations.<br>

                                                        <br>

                                                        I feel like this

                                                        is Clang working

                                                        around a

                                                        fundamental

                                                        deficiency in<br>

                                                        LLVM<br>

                                                        and we should

                                                        instead find a

                                                        way to fix this

                                                        in LLVM itself.<br>

                                                        <br>

                                                        As has been

                                                        pointed out

                                                        before, user

                                                        code can

                                                        synthesize large

                                                        integers<br>

                                                        that small bit

                                                        sequences are

                                                        extracted from,

                                                        and Clang and

                                                        LLVM should<br>

                                                        handle those

                                                        just as well as

                                                        actual

                                                        bitfields.<br>

                                                        <br>

                                                        Can we see how

                                                        far we can push

                                                        the LLVM side

                                                        before we add

                                                        complexity to<br>

                                                        Clang here? I

                                                        understand that

                                                        there remain

                                                        challenges to

                                                        LLVM's stuff,<br>

                                                        but I<br>

                                                        don't think

                                                        those challenges

                                                        make *all* of

                                                        the LLVM

                                                        improvements off

                                                        the<br>

                                                        table, I don't

                                                        think we've

                                                        exhausted all

                                                        ways of

                                                        improving the

                                                        LLVM<br>

                                                        changes<br>

                                                        being proposed,

                                                        and I think we

                                                        should still

                                                        land all of

                                                        those and<br>

                                                        re-evaluate how

                                                        important these

                                                        issues are when

                                                        all of that is

                                                        in place.<br>

                                                      </blockquote>

                                                      <br>

                                                      The main challenge

                                                      of doing  this in

                                                      LLVM is that

                                                      inter-procedural<br>

                                                      analysis<br>

                                                      (and possibly

                                                      cross module) is

                                                      needed (for store

                                                      forwarding

                                                      issues).<br>

                                                      <br>

                                                      Wei, perhaps you

                                                      can provide

                                                      concrete test case

                                                      to illustrate the

                                                      issue<br>

                                                      so<br>

                                                      that reviewers

                                                      have a good

                                                      understanding.<br>

                                                      <br>

                                                      David<br>

                                                    </blockquote>

                                                    Here is a runable

                                                    testcase:<br>

                                                    --------------------

                                                    1.cc

                                                    ------------------------<br>

                                                    class A {<br>

                                                    public:<br>

                                                        unsigned long

                                                    f1:2;<br>

                                                        unsigned long

                                                    f2:6;<br>

                                                        unsigned long

                                                    f3:8;<br>

                                                        unsigned long

                                                    f4:4;<br>

                                                    };<br>

                                                    A a;<br>

                                                    unsigned long b;<br>

                                                    unsigned long N =

                                                    1000000000;<br>

                                                    <br>

__attribute__((noinline))<br>

                                                    void foo() {<br>

                                                        a.f3 = 3;<br>

                                                    }<br>

                                                    <br>

__attribute__((noinline))<br>

                                                    void goo() {<br>

                                                        b = a.f3;<br>

                                                    }<br>

                                                    <br>

                                                    int main() {<br>

                                                        unsigned long i;<br>

                                                        for (i = 0; i

                                                    < N; i++) {<br>

                                                          foo();<br>

                                                          goo();<br>

                                                        }<br>

                                                    }<br>

------------------------------<wbr>------------------------------<br>

                                                    Now trunk takes

                                                    about twice running

                                                    time compared with

                                                    trunk + this<br>

                                                    patch. That is

                                                    because trunk

                                                    shrinks the store of

                                                    a.f3 in foo (Done by<br>

                                                    DagCombiner) but not

                                                    shrink the load of

                                                    a.f3 in goo, so

                                                    store<br>

                                                    forwarding will be

                                                    blocked.<br>

                                                  </blockquote>

                                                  <br>

                                                  I can confirm that

                                                  this is true on

                                                  Haswell and also on an

                                                  POWER8.<br>

                                                  Interestingly, on a

                                                  POWER7, the

                                                  performance is the

                                                  same either way (on

                                                  the<br>

                                                  good side). I ran the

                                                  test case as presented

                                                  and where I replaced

                                                  f3 with a<br>

                                                  non-bitfield unsigned

                                                  char member. Thinking

                                                  that the POWER7 result

                                                  might be<br>

                                                  because it's

                                                  big-Endian, I flipped

                                                  the order of the

                                                  fields, and found that<br>

                                                  the version where f3

                                                  is not a bitfield is

                                                  faster than otherwise,

                                                  but only by<br>

                                                  12.5%.<br>

                                                  <br>

                                                  Why, in this case,

                                                  don't we shrink the

                                                  load? It seems like we

                                                  should (and it<br>

                                                  seems like a

                                                  straightforward case).<br>

                                                  <br>

                                                  Thanks again,<br>

                                                  Hal<br>

                                                  <br>

                                                </blockquote>

                                                Hal, thanks for trying

                                                the test.<br>

                                                <br>

                                                Yes, it is

                                                straightforward to

                                                shrink the load in the

                                                test. I can<br>

                                                change the testcase a

                                                little to show why it is

                                                sometime difficult to<br>

                                                shrink the load:<br>

                                                <br>

                                                class A {<br>

                                                public:<br>

                                                   unsigned long f1:16;<br>

                                                   unsigned long f2:16;<br>

                                                   unsigned long f3:16;<br>

                                                   unsigned long f4:8;<br>

                                                };<br>

                                                A a;<br>

                                                bool b;<br>

                                                unsigned long N =

                                                1000000000;<br>

                                                <br>

__attribute__((noinline))<br>

                                                void foo() {<br>

                                                   a.f4 = 3;<br>

                                                }<br>

                                                <br>

__attribute__((noinline))<br>

                                                void goo() {<br>

                                                   b = (a.f4 == 0

                                                && a.f3 ==

                                                (unsigned short)-1);<br>

                                                }<br>

                                                <br>

                                                int main() {<br>

                                                   unsigned long i;<br>

                                                   for (i = 0; i < N;

                                                i++) {<br>

                                                     foo();<br>

                                                     goo();<br>

                                                   }<br>

                                                }<br>

                                                <br>

                                                For the load a.f4 in

                                                goo, it is diffcult to

                                                motivate its shrink

                                                after<br>

                                                instcombine because the

                                                comparison with a.f3 and

                                                the comparison with<br>

                                                a.f4 are merged:<br>

                                                <br>

                                                define void @_Z3goov()

                                                local_unnamed_addr #0 {<br>

                                                   %1 = load i64, i64*

                                                bitcast (%class.A* @a to

                                                i64*), align 8<br>

                                                   %2 = and i64 %1,

                                                0xffffff00000000<br>

                                                   %3 = icmp eq i64 %2,

                                                0xffff00000000<br>

                                                   %4 = zext i1 %3 to i8<br>

                                                   store i8 %4, i8* @b,

                                                align 1, !tbaa !2<br>

                                                   ret void<br>

                                                }<br>

                                              </blockquote>

                                              <br>

                                            </div>

                                          </div>

                                          Exactly. But this behavior is

                                          desirable, locally. There's no

                                          good answer here: We either

                                          generate extra load traffic

                                          here (because we need to load

                                          the fields separately), or we

                                          widen the store (which

                                          generates extra load traffic

                                          there). Do you know, in terms

                                          of performance, which is

                                          better in this case (i.e., is

                                          it better to widen the store

                                          or split the load)?<span

                                            class="m_-5573162088652840693m_4449805550791560709HOEnZb"><font

                                              color="#888888"><br>

                                              <br>

                                               -Hal</font></span>

                                          <div

                                            class="m_-5573162088652840693m_4449805550791560709HOEnZb">

                                            <div

                                              class="m_-5573162088652840693m_4449805550791560709h5"><br>

                                              <br>

                                              <blockquote

                                                class="gmail_quote"

                                                style="margin:0 0 0

                                                .8ex;border-left:1px

                                                #ccc

                                                solid;padding-left:1ex">

                                                <br>

                                                Thanks,<br>

                                                Wei.<br>

                                                <br>

                                                <blockquote

                                                  class="gmail_quote"

                                                  style="margin:0 0 0

                                                  .8ex;border-left:1px

                                                  #ccc

                                                  solid;padding-left:1ex">

                                                  <blockquote

                                                    class="gmail_quote"

                                                    style="margin:0 0 0

                                                    .8ex;border-left:1px

                                                    #ccc

                                                    solid;padding-left:1ex">

                                                    The testcases shows

                                                    the potential

                                                    problem of store

                                                    shrinking. Before<br>

                                                    we decide to do

                                                    store shrinking, we

                                                    need to know all the

                                                    related loads<br>

                                                    will be shrunk,  and

                                                    that requires IPA

                                                    analysis. Otherwise,

                                                    when load<br>

                                                    shrinking was

                                                    blocked for some

                                                    difficult case (Like

                                                    the instcombine<br>

                                                    case described in<br>

                                                    <a

                                                      moz-do-not-send="true"

href="https://www.mail-archive.com/cfe-commits@lists.llvm.org/msg65085.html"

                                                      rel="noreferrer"

                                                      target="_blank">https://www.mail-archive.com/c<wbr>fe-commits@lists.llvm.org/msg6<wbr>5085.html</a>),<br>

                                                    performance

                                                    regression will

                                                    happen.<br>

                                                    <br>

                                                    Wei.<br>

                                                    <br>

                                                    <br>

                                                    <blockquote

                                                      class="gmail_quote"

                                                      style="margin:0 0

                                                      0

                                                      .8ex;border-left:1px

                                                      #ccc

                                                      solid;padding-left:1ex">

                                                      <blockquote

                                                        class="gmail_quote"

                                                        style="margin:0

                                                        0 0

                                                        .8ex;border-left:1px

                                                        #ccc

                                                        solid;padding-left:1ex">

                                                        <br>

                                                        Repository:<br>

                                                            rL LLVM<br>

                                                        <br>

                                                        <a

                                                          moz-do-not-send="true"

href="https://reviews.llvm.org/D36562" rel="noreferrer" target="_blank">https://reviews.llvm.org/D3656<wbr>2</a><br>

                                                        <br>

                                                        <br>

                                                        <br>

                                                      </blockquote>

                                                    </blockquote>

______________________________<wbr>_________________<br>

                                                    llvm-commits mailing

                                                    list<br>

                                                    <a

                                                      moz-do-not-send="true"

href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a><br>

                                                    <a

                                                      moz-do-not-send="true"

href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits"

                                                      rel="noreferrer"

                                                      target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-commits</a><br>

                                                  </blockquote>

                                                  <br>

                                                  --<br>

                                                  Hal Finkel<br>

                                                  Lead, Compiler

                                                  Technology and

                                                  Programming Languages<br>

                                                  Leadership Computing

                                                  Facility<br>

                                                  Argonne National

                                                  Laboratory<br>

                                                  <br>

                                                </blockquote>

                                              </blockquote>

                                              <br>

                                              -- <br>

                                              Hal Finkel<br>

                                              Lead, Compiler Technology

                                              and Programming Languages<br>

                                              Leadership Computing

                                              Facility<br>

                                              Argonne National

                                              Laboratory<br>

                                              <br>

                                            </div>

                                          </div>

                                        </blockquote>

                                      </div>

                                      <br>

                                    </div>

                                  </blockquote>

                                  <br>

                                  <pre class="m_-5573162088652840693m_4449805550791560709moz-signature" cols="72">-- 

Hal Finkel

Lead, Compiler Technology and Programming Languages

Leadership Computing Facility

Argonne National Laboratory</pre>

                                </div>

                              </div>

                            </div>

                          </blockquote>

                        </div>

                        <br>

                      </div>

                    </blockquote>

                    <br>

                    <pre class="m_-5573162088652840693moz-signature" cols="72">-- 

Hal Finkel

Lead, Compiler Technology and Programming Languages

Leadership Computing Facility

Argonne National Laboratory</pre>

                  </div>

                </div>

              </div>

            </blockquote>

          </div>

          <br>

        </div>

      </div>

    </blockquote>

    <br>

    <pre class="moz-signature" cols="72">-- 

Hal Finkel

Lead, Compiler Technology and Programming Languages

Leadership Computing Facility

Argonne National Laboratory</pre>

  </body>

</html>