[llvm-dev] Why did Intel change his static branch prediction mechanism during these years?

Wed Aug 15 21:45:41 PDT 2018

Hi, Fabian Giesen. Sorry for reply so late.

>>>As for static vs. dynamic prediction, to be making explicit static predictions, you actually need to know whether you have seen a given branch before, and many branch predictors don't! That is, they simply have no way to tell reliably whether a given branch history (or branch target) entry is actually for the branch it's trying to predict. The various branch prediction structures are mostly just bags of bits indexed by a hash code.

As you said, the branch predictor has no way to tell if met the branch
before, because BHB is indexed by hash value. But for branch target
predictor, BTB on early Intel CPUs is organized like common cache,
with set/way/tag identified by the full address of a branch.

So, I guess, is there a chance that the dynamic prediction phase quits
on BTB missing? Namely, if the branch target predictor(rather than
branch predictor) failed to find out a target, then it made the
conclusion that never see that branch before, no enough evidence for
dynamic prediction and pass control to static predictor. Otherwise,
how to explain early Intel CPUs(Pentium I, II, III) before Pentium4
support static prediction without trace cache ²?

I found some words from Intel document ¹ which may support my guess :
"Branches that do not have a history in the BTB (see Section 3.4.1,
“Branch Prediction Optimization”) are predicted using a static
prediction algorithm... "
Though I believe it's no longer applied for newer Intel CPUs, it
should be at least  an evidence from  Pentium age.

And I found another thing interesting.
Since Core architecture Intel stopped using static prediction
heuristic: (still quote from Intel document 3.4.1.3)
"The Intel Core microarchitecture does not use the static prediction
heuristic. However, to maintain consistency across Intel 64 and IA-32
processors, software should maintain the static prediction heuristic
as the default."

The interesting point is: Just on Core micro architecture, the
organization of BTB changed. Intel no longer use the full  address  of
a branch to identify BTB entry, but 0~21 bits of the address. And thus
the branch target predictor can't tell precisely whether seen a branch
before, just like branch predictor.

Thanks again.

APPENDIX:
¹ Intel Document, section 3.4.1.3
  https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf
² Agner's optimization guide, section 3.2 3.3
https://www.agner.org/optimize/microarchitecture.pdf