<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Dear Mihail,<br>
<br>
I've added these two publications to the publications page.
Please review it and let me know if I need to make any changes.
In particular, if you have URLs to use for the papers, having
those would be greatly appreciated.<br>
<br>
Regards,<br>
<br>
John Criswell<br>
<br>
On 11/28/17 12:05 PM, Mihail Popov via llvm-dev wrote:<br>
</div>
<blockquote cite="mid:fbd31479900b341250e1f2cbfcd8898b@uvsq.fr"
type="cite">
<p>Hello,<br>
<br>
I would like to submit two papers that use LLVM to the Related
Publications section.<br>
<br>
Both papers focus on code isolation applied to perform piecewise
compiler optimizations.<br>
The code isolation process is performed by CERE, an open source
tool based on LLVM.<br>
<br>
The second paper is an extended version of the first one.<br>
<br>
1) Piecewise Holistic Autotuning of Compiler and Runtime
Parameters</p>
<p><br>
@inproceedings{popov2016piecewise,<br>
title={Piecewise Holistic Autotuning of Compiler and Runtime
Parameters},<br>
author={Popov, Mihail and Akel, Chadi and Jalby, William and
de Oliveira Castro, Pablo},<br>
booktitle={European Conference on Parallel Processing},<br>
pages={238--250},<br>
year={2016},<br>
organization={Springer}<br>
}<br>
<br>
2) Piecewise holistic autotuning of parallel programs with CERE</p>
<p><br>
@article{popov2017piecewise,<br>
title={Piecewise holistic autotuning of parallel programs with
CERE},<br>
author={Popov, Mihail and Akel, Chadi and Chatelain, Yohan and
Jalby, William and de Oliveira Castro, Pablo},<br>
journal={Concurrency and Computation: Practice and
Experience},<br>
volume={29},<br>
number={15},<br>
year={2017},<br>
publisher={Wiley Online Library}<br>
}<br>
<br>
Do not hesitate if you have any questions or if you need any
additional documents.<br>
<br>
Thank you,<br>
Mihail Popov<br>
<br>
<br>
-----------------------------------------------------------------------------------<br>
<br>
PAPERS SUMMARY:<br>
<br>
Piecewise Holistic Autotuning of Compiler and Runtime Parameters<br>
<br>
Abstract. Current architecture complexity requires fine tuning
of compiler <br>
and runtime parameters to achieve full potential performance.
Autotuning <br>
substantially improves default parameters in many scenarios<br>
but it is a costly process requiring a long iterative
evaluation.<br>
We propose an automatic piecewise autotuner based on CERE
(Codelet<br>
Extractor and REplayer). CERE decomposes applications into small<br>
pieces called codelets: each codelet maps to a loop or to an
OpenMP<br>
parallel region and can be replayed as a standalone program.<br>
Codelet autotuning achieves better speedups at a lower tuning
cost. By<br>
grouping codelet invocations with the same performance behavior,
CERE<br>
reduces the number of loops or OpenMP regions to be evaluated.
Moreover <br>
unlike whole-program tuning, CERE customizes the set of best <br>
parameters for each specific OpenMP region or loop.<br>
We demonstrate CERE tuning of compiler optimizations, number of<br>
threads and thread affinity on a NUMA architecture. On average
over the<br>
NAS 3.0 benchmarks, we achieve a speedup of 1.08× after tuning.
Tuning <br>
a single codelet is 13× cheaper than whole-program evaluation
and<br>
estimates the tuning impact on the original region with a 94.7%
accuracy. <br>
On a Reverse Time Migration (RTM) proto-application we achieve<br>
a 1.11× speedup with a 200× cheaper exploration.<br>
<br>
<br>
Piecewise Holistic Autotuning of Parallel Programs with CERE<br>
<br>
Current architecture complexity requires fine tuning of compiler<br>
and runtime parameters to achieve best performance. Autotuning<br>
substantially improves default parameters in many scenarios but
it is a<br>
costly process requiring long iterative evaluations.<br>
We propose an automatic piecewise autotuner based on CERE
(Codelet<br>
Extractor and REplayer). CERE decomposes applications into small<br>
pieces called codelets: each codelet maps to a loop or to an
OpenMP<br>
parallel region and can be replayed as a standalone program.<br>
Codelet autotuning achieves better speedups at a lower tuning
cost. By<br>
grouping codelet invocations with the same performance behavior,
CERE<br>
reduces the number of loops or OpenMP regions to be evaluated.
Moreover <br>
unlike whole-program tuning, CERE customizes the set of best
parameters<br>
for each specific OpenMP region or loop.<br>
We demonstrate the CERE tuning of compiler optimizations, number<br>
of threads, thread affinity, and scheduling policy on both NUMA
and<br>
heterogeneous architectures. Over the NAS benchmarks, we achieve
an<br>
average speedup of 1.08× after tuning. Tuning a codelet is 13×
cheaper<br>
than whole-program evaluation and predicts the tuning impact
with a<br>
94.7% accuracy. Similarly, exploring thread configurations and
scheduling<br>
policies for a Black-Scholes solver on an heterogeneous
big.LITTLE<br>
architecture is over 40× faster using CERE.<br>
<br>
</p>
<p> </p>
<div> </div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
LLVM Developers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>
<a class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>
</pre>
</blockquote>
<br>
<p><br>
</p>
<pre class="moz-signature" cols="72">--
John Criswell
Assistant Professor
Department of Computer Science, University of Rochester
<a class="moz-txt-link-freetext" href="http://www.cs.rochester.edu/u/criswell">http://www.cs.rochester.edu/u/criswell</a></pre>
</body>
</html>