[llvm-dev] Publication LLVM Related Publications Submission

Tue Jan 30 05:30:32 PST 2018

Dear Mihail,

I've added these two publications to the publications page. Please 
review it and let me know if I need to make any changes. In particular, 
if you have URLs to use for the papers, having those would be greatly 
appreciated.

Regards,

John Criswell

On 11/28/17 12:05 PM, Mihail Popov via llvm-dev wrote:
>
> Hello,
>
> I would like to submit two papers that use LLVM to the Related 
> Publications section.
>
> Both papers focus on code isolation applied to perform piecewise 
> compiler optimizations.
> The code isolation process is performed by CERE, an open source tool 
> based on LLVM.
>
> The second paper is an extended version of the first one.
>
> 1) Piecewise Holistic Autotuning of Compiler and Runtime Parameters
>
>
> @inproceedings{popov2016piecewise,
>   title={Piecewise Holistic Autotuning of Compiler and Runtime 
> Parameters},
>   author={Popov, Mihail and Akel, Chadi and Jalby, William and de 
> Oliveira Castro, Pablo},
>   booktitle={European Conference on Parallel Processing},
>   pages={238--250},
>   year={2016},
>   organization={Springer}
> }
>
> 2) Piecewise holistic autotuning of parallel programs with CERE
>
>
> @article{popov2017piecewise,
>   title={Piecewise holistic autotuning of parallel programs with CERE},
>   author={Popov, Mihail and Akel, Chadi and Chatelain, Yohan and 
> Jalby, William and de Oliveira Castro, Pablo},
>   journal={Concurrency and Computation: Practice and Experience},
>   volume={29},
>   number={15},
>   year={2017},
>   publisher={Wiley Online Library}
> }
>
> Do not hesitate if you have any questions or if you need any 
> additional documents.
>
> Thank you,
> Mihail Popov
>
>
> -----------------------------------------------------------------------------------
>
> PAPERS SUMMARY:
>
> Piecewise Holistic Autotuning of Compiler and Runtime Parameters
>
> Abstract. Current architecture complexity requires fine tuning of 
> compiler
> and runtime parameters to achieve full potential performance. Autotuning
> substantially improves default parameters in many scenarios
> but it is a costly process requiring a long iterative evaluation.
> We propose an automatic piecewise autotuner based on CERE (Codelet
> Extractor and REplayer). CERE decomposes applications into small
> pieces called codelets: each codelet maps to a loop or to an OpenMP
> parallel region and can be replayed as a standalone program.
> Codelet autotuning achieves better speedups at a lower tuning cost. By
> grouping codelet invocations with the same performance behavior, CERE
> reduces the number of loops or OpenMP regions to be evaluated. Moreover
> unlike whole-program tuning, CERE customizes the set of best
> parameters for each specific OpenMP region or loop.
> We demonstrate CERE tuning of compiler optimizations, number of
> threads and thread affinity on a NUMA architecture. On average over the
> NAS 3.0 benchmarks, we achieve a speedup of 1.08× after tuning. Tuning
> a single codelet is 13× cheaper than whole-program evaluation and
> estimates the tuning impact on the original region with a 94.7% accuracy.
> On a Reverse Time Migration (RTM) proto-application we achieve
> a 1.11× speedup with a 200× cheaper exploration.
>
>
> Piecewise Holistic Autotuning of Parallel Programs with CERE
>
> Current architecture complexity requires fine tuning of compiler
>  and runtime parameters to achieve best performance. Autotuning
> substantially improves default parameters in many scenarios but it is a
> costly process requiring long iterative evaluations.
> We propose an automatic piecewise autotuner based on CERE (Codelet
> Extractor and REplayer). CERE decomposes applications into small
> pieces called codelets: each codelet maps to a loop or to an OpenMP
> parallel region and can be replayed as a standalone program.
> Codelet autotuning achieves better speedups at a lower tuning cost. By
> grouping codelet invocations with the same performance behavior, CERE
> reduces the number of loops or OpenMP regions to be evaluated. Moreover
> unlike whole-program tuning, CERE customizes the set of best parameters
>  for each specific OpenMP region or loop.
> We demonstrate the CERE tuning of compiler optimizations, number
> of threads, thread affinity, and scheduling policy on both NUMA and
> heterogeneous architectures. Over the NAS benchmarks, we achieve an
> average speedup of 1.08× after tuning. Tuning a codelet is 13× cheaper
> than whole-program evaluation and predicts the tuning impact with a
> 94.7% accuracy. Similarly, exploring thread configurations and scheduling
>  policies for a Black-Scholes solver on an heterogeneous big.LITTLE
> architecture is over 40× faster using CERE.
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 
John Criswell
Assistant Professor
Department of Computer Science, University of Rochester
http://www.cs.rochester.edu/u/criswell

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180130/befd7e9d/attachment.html>