[llvm] [LoopUnroll] Clamp PartialThreshold for large LoopMicroOpBufferSize (PR #67657)

Wed Oct 4 12:59:21 PDT 2023

goldsteinn wrote:

> > I've always struggled with the idea of LoopMicroOpBufferSize in general. It appears to be just used as a way of throttling unrolls, to avoid the potential cost of unnecessary compare+branch.
> 
> I believe the primary purpose of LoopMicroOpBufferSize here is to make sure that we don't runtime-unroll past the loop buffer size: Doing so could mean that a (non-unrolled) loop that previously used the loop buffer may no longer do so after unrolling, which would be a clear pessimization.
Not that this patch should address it, but on Intel X86 if the loop has a highly predictable iteration count in ~[24, 150] range (24=loop starts using LSD at ~this iter count, 150=max depth of branch predictor) **using the loop micro op buffer can be a pessimization**. This is because [the LSD exits via a branch miss](https://stackoverflow.com/a/67118925/11322131) so a loop that would be perfectly predicted running from the DSB gets an branch miss added to it. 


https://github.com/llvm/llvm-project/pull/67657