[llvm] Change how branch weight is annotated for direct call (PR #90315)

Sat May 4 02:44:21 PDT 2024

huangjd wrote:

To explain the issue we encountered in more detail, and what this patch is trying to fix.

In some industrial use cases we encountered, we have a source file containing many repeated if statements, each calling a template specialized function, and the profile contains line-matching samples in the following form
```
func1:10000:12
 5: 10 callee1:0
 10: 9 callee2:1
 15: 11 callee3:0
```
At sample profile loader, each call is assigned a weight equal to the sample count (the first number), even if the direct call matches a call target in the sample exactly. This weight propagates to the entire "then" basic block, marking both the then branch and the function call inside it as hot.  In inliner pass, all these function calls are inlined, causing the IR size increase an order of magnitude, and significantly slowing down backend. Profiling shows that when all functions are inlined, half of the time is spent on machine block placement (which may indicate a problem of its own).

The problem here is that we have a contradictory input profile ("then" path is hot, but the call target is cold despite being called unconditionally), and the compiler is bogged down by such input, given that it normally compiles the program within reasonable time without a profile. 

https://github.com/llvm/llvm-project/pull/90315