[llvm] Change how branch weight is annotated for direct call (PR #90315)

David Li via llvm-commits llvm-commits at lists.llvm.org
Thu May 2 21:14:50 PDT 2024


david-xl wrote:

> I'm actually not sure if this change is needed.
> 
> If you look at how profile is being generated, you will see that body samples are populated using range counters, and call samples are populated with branch counters. Range and branch counters aren't always guaranteed to be perfect match even though they were derived from same raw LBRs due to different filtering.
> 
> The discrepancy may not be perfect, but in practice it's also not that harmful given we deal with block counts and call counts somewhat separately. Also note that the discrepancy can be a result of changing code (from indirect call when profiled, to direct call when compiling again).
> 
> `getInstWeight` is mostly used compute `getBlockWeight` and I think it's probably more reasonable to use counts derived from Range counters regardless of call or non-call instructions, so they stay more consistent, which would lead to better input for profile inference later on. If you resort to use call target counts for direct call, you would be using count derived from Branch counters for calls, counts derived from Range counters for non-call, and that can create more pressure on inference with potentially worse result.



> I'm actually not sure if this change is needed.
> 
> If you look at how profile is being generated, you will see that body samples are populated using range counters, and call samples are populated with branch counters. Range and branch counters aren't always guaranteed to be perfect match even though they were derived from same raw LBRs due to different filtering.
> 
> The discrepancy may not be perfect, but in practice it's also not that harmful given we deal with block counts and call counts somewhat separately. Also note that the discrepancy can be a result of changing code (from indirect call when profiled, to direct call when compiling again).
> 
> `getInstWeight` is mostly used compute `getBlockWeight` and I think it's probably more reasonable to use counts derived from Range counters regardless of call or non-call instructions, so they stay more consistent, which would lead to better input for profile inference later on. If you resort to use call target counts for direct call, you would be using count derived from Branch counters for calls, counts derived from Range counters for non-call, and that can create more pressure on inference with potentially worse result.


Here is a scenario this patch may help.

Say the binary being profiled has an indirect call and the sample profile has an indirect target profile for the site. In thinLTO  prelink compilation, the indirect call is indirect call promoted resulting with if-then-else in which 'then' branch has a direct call.  In the postlink compilation, when the sample profile is loaded, it is better to use the call count to annotate the block.


https://github.com/llvm/llvm-project/pull/90315


More information about the llvm-commits mailing list