<div dir="ltr"><div>From Matt's page, <i> "The target program consists of 1000 back to back branches"</i>. <br></div><div><br></div><div>Perhaps the processors got better at detecting when they may be running off into some kind of data?  Wouldn't I want it to predict that kind of thing as not taken?  <br></div><div><br></div><div>Just a thought... <br></div><div><br></div><div>-G<br></div><br><div class="gmail_quote"><div dir="ltr">On Tue, Aug 14, 2018 at 4:09 PM 2016 quekong via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">( I don't know if it's allowed to ask such question, if not, please remind me. )<br>

<br>

I know Intel implemented several static branch prediction mechanisms<br>

these years:<br>

  * 80486 age: Always-not-take<br>

  * Pentium4 age: Backwards Taken/Forwards Not-Taken<br>

  * PM, Core2: Didn't use static prediction,  randomly depending on<br>

what happens to be in corresponding BTB entry , according to agner's<br>

optimization guide ¹.<br>

  * Newer CPUs like Ivy Bridge, Haswell have become increasingly<br>

intangible, according to Matt G's experiment ².<br>

<br>

And Intel seems don't want to talk about it any more, because the<br>

latest material I found within Intel Document was written about ten<br>

years ago.<br>

<br>

I know static branch prediction is (far?) less important than dynamic,<br>

but in quite a few situations, CPU will be completely lost and<br>

programmers(with compiler) are usually the best guide. Of course these<br>

situations are usually not performance bottleneck, because once a<br>

branch is frequently executed, the dynamic predictor will capture it.<br>

<br>

Since Intel no longer clearly statements the dynamic prediction<br>

mechanism in its document, the builtin_expect() of GCC can do nothing<br>

more than removing the unlikely branch from hot path or reversely for<br>

likely branch.<br>

<br>

I am not familiar with CPU design and I don't know what exactly<br>

mechanism Intel use nowadays for its static predictor, I just feel the<br>

best static mechanism for Intel should be to clearly document his CPU<br>

"where I plan to go when dynamic predictor failed, forward or<br>

backward", because usually the programmer is the best guide at that<br>

time.<br>

<br>

<br>

APPENDIX:<br>

¹ Agner's optimization guide:<br>

<a href="https://www.agner.org/optimize/microarchitecture.pdf" rel="noreferrer" target="_blank">https://www.agner.org/optimize/microarchitecture.pdf</a>   ,  section 3.5<br>

.<br>

<br>

² Matt G's experiment: <a href="https://xania.org/201602/bpu-part-two" rel="noreferrer" target="_blank">https://xania.org/201602/bpu-part-two</a><br>

_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

</blockquote></div></div>