<div dir="ltr">Hi,<div><br></div><div>Following Chandler and Jiangning's advices, I did more experiments to find out a more accurate number to be used as loop buffer size on A57. All experiments are based on<span style="font-family:arial,sans-serif;font-size:13.63636302947998px"> </span><span style="font-family:arial,sans-serif;font-size:13.63636302947998px">Dave's Cortex-A57 Machine Model update, and the experimented numbers are 12, 14, 16, 18, 20, 22, 24. </span></div><div><span style="font-family:arial,sans-serif;font-size:13.63636302947998px"><br></span></div><div><font face="arial, sans-serif">From the result we can see that, when loop buffer size is 16, all benchmarks got or close to the lowest execution time among all tries, which brings about 0.5% performance improvement on eembc, spec2000 and spec2006, and the code bloat is about 1.5% in geomean and 7% at worst case respectively. </font></div><div><br></div><div>Thanks,</div><div>Kevin</div></div><div class="gmail_extra"><br><div class="gmail_quote">2014-10-06 11:09 GMT+01:00 Kevin Qin <span dir="ltr"><<a href="mailto:kevinqindev@gmail.com" target="_blank">kevinqindev@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>

<br>

After some performance investigation on smaller granular, I proposed to set 16 as loop buffer size for A57. I will post some experiments result in following comments. Thanks Chandler and Jiangning for their advices.<br>

<div class="HOEnZb"><div class="h5"><br>

Cheers,<br>

Kevin<br>

<br>

<a href="http://reviews.llvm.org/D5148" target="_blank">http://reviews.llvm.org/D5148</a><br>

<br>

Files:<br>

  lib/Target/AArch64/AArch64SchedA57.td<br>

  lib/Target/AArch64/AArch64TargetTransformInfo.cpp<br>

</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div dir="ltr">Best Regards,<div><br></div><div>Kevin Qin</div></div>

</div>