[PATCH] D131287: Fix branch weight in FoldCondBranchOnValueKnownInPredecessor pass in SimplifyCFG

Fri Oct 7 08:16:59 PDT 2022

LukeZhuang added a comment.

And @paulkirth, also thank you very much for the comments and I'm working on improving it

================
Comment at: llvm/test/Transforms/SimplifyCFG/fold-cond-bi-fix-weight-overflow.ll:10
+
+;; Create a test case that will definitely overflow when doing naive add/mul,
+;; by making branch_weights which used in the calculation to be UINT32_MAX.
----------------
davidxl wrote:
> In this case, why does the prof meta data for the switch need to change in the first place? Can the original meta data be retained?
> 
> Regarding overflow,  should the branch weights be scaled down first to avoid the problem in the first place?
I created this test to not testing the feature itself but to catch a potential risk in my method that may overflow if I didn't handle it appropriately.
I already scale down it in the code, but even so, I will still need to calculate the product three uint32_t, and I proved that the product of two of them will not exceed UINT_MAX. So this test is intended for this.

================
Comment at: llvm/test/Transforms/SimplifyCFG/fold-cond-bi-fix-weight.ll:58
+;; After the patch, the branch weights from entry are updated as 1:19, then the
+;; probability of hitting foo is 29/200(14.5%), which is much closer to original one
+define void @branchinst_noweight(i8 %a0, i8 %a1, i8 %a2) {
----------------
davidxl wrote:
> By looking at example only, it looks like the right way is to compute the branch weights for entry block using backward propagation.
> 
> Say entry->lor.end  has weights x, and sentry->lor.rhs has weights y.  Assuming entry (and lor.end) has weights 100.  The following system equation holds (assumes no path sensitivity, and thus the branch weights after transformation remains the same for lor.rhs), we have
> 
>  x+ y = 100
>   x + y/10 = 10
> 
> And it gives x = 0, y = 100
> 
> 
> Please note that, due to the possibility of path sensitive behavior, there is no reason to believe that probability of lor.rhs without this patch is any worse than the one with the patch.
Hi David, thanks for the comments!
I have a question on it. The backward propagation require us to know the block frequency info(or especially know the whole CFG), may I ask is that correct? Otherwise I think we cannot know lor.end has the same weight as entry, and if.then has 10% weights of entry.
So I think that goes back to our previous discussion. The first issue is the transformation in SimplifyCFG does not know either BlockFrequencyInfo nor the whole CFG. The second issue is this branch weight maintenance can also happen with static code prediction itself(using __builtin_expect in my example), in which case we do not have frequency info from dynamic analysis such as PGO.
Please correct me if I understand it incorrectly.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131287/new/

https://reviews.llvm.org/D131287