<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">Dear Mihail,<br>

      <br>

      I've added these two publications to the publications page. 

      Please review it and let me know if I need to make any changes. 

      In particular, if you have URLs to use for the papers, having

      those would be greatly appreciated.<br>

      <br>

      Regards,<br>

      <br>

      John Criswell<br>

      <br>

      On 11/28/17 12:05 PM, Mihail Popov via llvm-dev wrote:<br>

    </div>

    <blockquote cite="mid:fbd31479900b341250e1f2cbfcd8898b@uvsq.fr"

      type="cite">

      <p>Hello,<br>

        <br>

        I would like to submit two papers that use LLVM to the Related

        Publications section.<br>

        <br>

        Both papers focus on code isolation applied to perform piecewise

        compiler optimizations.<br>

        The code isolation process is performed by CERE, an open source

        tool based on LLVM.<br>

        <br>

        The second paper is an extended version of the first one.<br>

        <br>

        1) Piecewise Holistic Autotuning of Compiler and Runtime

        Parameters</p>

      <p><br>

        @inproceedings{popov2016piecewise,<br>

          title={Piecewise Holistic Autotuning of Compiler and Runtime

        Parameters},<br>

          author={Popov, Mihail and Akel, Chadi and Jalby, William and

        de Oliveira Castro, Pablo},<br>

          booktitle={European Conference on Parallel Processing},<br>

          pages={238--250},<br>

          year={2016},<br>

          organization={Springer}<br>

        }<br>

        <br>

        2) Piecewise holistic autotuning of parallel programs with CERE</p>

      <p><br>

        @article{popov2017piecewise,<br>

          title={Piecewise holistic autotuning of parallel programs with

        CERE},<br>

          author={Popov, Mihail and Akel, Chadi and Chatelain, Yohan and

        Jalby, William and de Oliveira Castro, Pablo},<br>

          journal={Concurrency and Computation: Practice and

        Experience},<br>

          volume={29},<br>

          number={15},<br>

          year={2017},<br>

          publisher={Wiley Online Library}<br>

        }<br>

        <br>

        Do not hesitate if you have any questions or if you need any

        additional documents.<br>

        <br>

        Thank you,<br>

        Mihail Popov<br>

        <br>

        <br>

-----------------------------------------------------------------------------------<br>

        <br>

        PAPERS SUMMARY:<br>

        <br>

        Piecewise Holistic Autotuning of Compiler and Runtime Parameters<br>

        <br>

        Abstract. Current architecture complexity requires fine tuning

        of compiler <br>

        and runtime parameters to achieve full potential performance.

        Autotuning <br>

        substantially improves default parameters in many scenarios<br>

        but it is a costly process requiring a long iterative

        evaluation.<br>

        We propose an automatic piecewise autotuner based on CERE

        (Codelet<br>

        Extractor and REplayer). CERE decomposes applications into small<br>

        pieces called codelets: each codelet maps to a loop or to an

        OpenMP<br>

        parallel region and can be replayed as a standalone program.<br>

        Codelet autotuning achieves better speedups at a lower tuning

        cost. By<br>

        grouping codelet invocations with the same performance behavior,

        CERE<br>

        reduces the number of loops or OpenMP regions to be evaluated.

        Moreover <br>

        unlike whole-program tuning, CERE customizes the set of best <br>

        parameters for each specific OpenMP region or loop.<br>

        We demonstrate CERE tuning of compiler optimizations, number of<br>

        threads and thread affinity on a NUMA architecture. On average

        over the<br>

        NAS 3.0 benchmarks, we achieve a speedup of 1.08× after tuning.

        Tuning <br>

        a single codelet is 13× cheaper than whole-program evaluation

        and<br>

        estimates the tuning impact on the original region with a 94.7%

        accuracy. <br>

        On a Reverse Time Migration (RTM) proto-application we achieve<br>

        a 1.11× speedup with a 200× cheaper exploration.<br>

        <br>

        <br>

        Piecewise Holistic Autotuning of Parallel Programs with CERE<br>

        <br>

        Current architecture complexity requires fine tuning of compiler<br>

         and runtime parameters to achieve best performance. Autotuning<br>

        substantially improves default parameters in many scenarios but

        it is a<br>

        costly process requiring long iterative evaluations.<br>

        We propose an automatic piecewise autotuner based on CERE

        (Codelet<br>

        Extractor and REplayer). CERE decomposes applications into small<br>

        pieces called codelets: each codelet maps to a loop or to an

        OpenMP<br>

        parallel region and can be replayed as a standalone program.<br>

        Codelet autotuning achieves better speedups at a lower tuning

        cost. By<br>

        grouping codelet invocations with the same performance behavior,

        CERE<br>

        reduces the number of loops or OpenMP regions to be evaluated.

        Moreover <br>

        unlike whole-program tuning, CERE customizes the set of best

        parameters<br>

         for each specific OpenMP region or loop.<br>

        We demonstrate the CERE tuning of compiler optimizations, number<br>

        of threads, thread affinity, and scheduling policy on both NUMA

        and<br>

        heterogeneous architectures. Over the NAS benchmarks, we achieve

        an<br>

        average speedup of 1.08× after tuning. Tuning a codelet is 13×

        cheaper<br>

        than whole-program evaluation and predicts the tuning impact

        with a<br>

        94.7% accuracy. Similarly, exploring thread configurations and

        scheduling<br>

         policies for a Black-Scholes solver on an heterogeneous

        big.LITTLE<br>

        architecture is over 40× faster using CERE.<br>

        <br>

      </p>

      <p> </p>

      <div> </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

LLVM Developers mailing list

<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>

<a class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>

</pre>

    </blockquote>

    <br>

    <p><br>

    </p>

    <pre class="moz-signature" cols="72">-- 

John Criswell

Assistant Professor

Department of Computer Science, University of Rochester

<a class="moz-txt-link-freetext" href="http://www.cs.rochester.edu/u/criswell">http://www.cs.rochester.edu/u/criswell</a></pre>

  </body>

</html>