<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    In addition to the concerns Chandler figure out, <br>

    I'm curious about : <br>

        execution time of pristine-llc vs "modified-llc with -thd=1",

    and <br>

        the exec-time of pristine-clang vs

    clang-linked-with-the-modified-llc.<br>

    <br>

    Thanks<br>

    <br>

        <br>

    <div class="moz-cite-prefix">On 7/16/13 3:46 AM, Chandler Carruth

      wrote:<br>

    </div>

    <blockquote

cite="mid:CAGCO0KgvZMrgiZE=6PZPGT3V0mORWBk+0yyXbKgg9Y8S4R7MCg@mail.gmail.com"

      type="cite">

      <div dir="ltr">While I think the end goal you're describing is

        close to the correct one, I see the high-level strategy for

        getting there somewhat differently:

        <div><br>

        </div>

        <div>1) The code generators are only one collection of function

          passes that might be parallelized. Many others might also be

          parallelized profitably. The design for parallelism within

          LLVM's pass management infrastructure should be sufficiently

          generic to express all of these use cases.</div>

        <div><br>

        </div>

        <div>2) The idea of having multiple pass managers necessitates

          (unless I misunderstand) duplicating a fair amount of state.

          For example, the caches in immutable analysis passes would no

          longer be shared, etc. I think that is really unfortunate, and

          would prefer instead to use parallelizing pass managers that

          are in fact responsible for the scheduling of passes.</div>

        <div><br>

        </div>

        <div>3) It doesn't provide a strategy for parallelizing the

          leaves of a CGSCC pass manager which is where a significant

          portion of the potential parallelism is available within the

          middle end.</div>

        <div>

          <br>

        </div>

        <div>4) It doesn't deal with the (numerous) parts of LLVM that

          are not actually thread safe today. They may happen to work

          with the code generators you're happening to test, but there

          is no guarantee. Notable things to think about here are

          computing new types, the use-def lists of globals, commandline

          flags, and static state variables. While our intent has been

          to avoid problems with the last two that could preclude

          parallelism, it seems unlikely that we have succeeded without

          thorough testing to this point. Instead, I fear we have leaned

          heavily on the crutch of one-thread-per-LLVMContext.</div>

        <div><br>

        </div>

        <div>5) It adds more complexity onto the poorly designed pass

          manager infrastructure. Personally, I think that cleanups to

          the design and architecture of the pass manager should be

          prioritized above adding new functionality like parallelism.

          However, so far no one has really had time to do this

          (including myself). While I would like to have time in the

          future to do this, as with everything else in OSS, it won't be

          real until the patches start flowing.</div>

      </div>

      <div class="gmail_extra"><br>

        <br>

        <div class="gmail_quote">On Tue, Jul 16, 2013 at 3:33 AM, Wan,

          Xiaofei <span dir="ltr"><<a moz-do-not-send="true"

              href="mailto:xiaofei.wan@intel.com" target="_blank">xiaofei.wan@intel.com</a>></span>

          wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,

            community:<br>

            <br>

            For the sake of our business need, I want to enable

            "Function-based parallel code generation" to boost up the

            compilation of single module, please see the details of the

            design and provide your feedbacks on below aspects, thanks!<br>

            1. Is this idea the proper solution for my requirement<br>

            2. This new feature will be enabled by llc -thd=N and has no

            impact on original llc when -thd=1<br>

            3. Can this new feature of llc be accepted by community and

            merged into LLVM code tree<br>

            <br>

            Patches<br>

            The patch is divided into four separated parts, the

            all-in-one patch could be found here:<br>

            <a moz-do-not-send="true"

              href="http://llvm-reviews.chandlerc.com/D1152"

              target="_blank">http://llvm-reviews.chandlerc.com/D1152</a><br>

            <br>

            Design<br>

            <a moz-do-not-send="true"

href="https://docs.google.com/document/d/1QSkP6AumMCAVpgzwympD5pI3btPJt4SRgjY-vhyfySg/edit?usp=sharing"

              target="_blank">https://docs.google.com/document/d/1QSkP6AumMCAVpgzwympD5pI3btPJt4SRgjY-vhyfySg/edit?usp=sharing</a><br>

            <br>

            <br>

            Background<br>

            1. Our business need to compile C/C++ source files into LLVM

            IR and link them into a big BC file; the big BC file is then

            compiled into binary code on different arch/target devices.<br>

            2. Backend code generation is a time-consuming activity

            happened on target device which makes it an important user

            experience.<br>

            3. Make -j or file based parallelism can't help here since

            there is only one big BC file; function-based parallel LLVM

            backend code generation is a good solution to improve

            compilation time which will fully utilize multi-cores.<br>

            <br>

            Overall design strategy and goal<br>

            1. Generate totally same binary as what single thread output<br>

            2. No impacts on single thread performance & conformance<br>

            3. Little impacts on LLVM code infrastructure<br>

            <br>

            Current status and test result<br>

            1. Parallel llc can generate same code as single thread by

            "objdump -d", it could pass 10 hours stress test for all

            performance benchmark<br>

            2. Parallel llc can introduce ~2.9X performance gain on XEON

            sever for 4 threads<br>

            <br>

            <br>

            Thanks<br>

            <span class="HOEnZb"><font color="#888888">Wan Xiaofei<br>

              </font></span><br>

            _______________________________________________<br>

            LLVM Developers mailing list<br>

            <a moz-do-not-send="true" href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>

                    <a moz-do-not-send="true"

              href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

            <a moz-do-not-send="true"

              href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev"

              target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

            <br>

          </blockquote>

        </div>

        <br>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

LLVM Developers mailing list

<a class="moz-txt-link-abbreviated" href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>         <a class="moz-txt-link-freetext" href="http://llvm.cs.uiuc.edu">http://llvm.cs.uiuc.edu</a>

<a class="moz-txt-link-freetext" href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>