Thu Oct 15 08:33:17 PDT 2015

Hi all,


I have a simple test case like this


class H



    int a;              

    int b;  

    int c;  



inline int operator < (H& A, H& B)


    return (A.a < B.a) ? 1 :

           (A.a > B.a) ? 0 :

           (A.b < B.b) ? 1 :

           (A.b > B.b) ? 0 :

            A.c < B.c;



int s(H *h, int j)


    if (h[j] < h[j+1])  



    return j;



The generated assembly in AArch64 is like below.  X86 assembly has similar


If(h0.a < h1.a) goto bb0

if (h0.a > h1.a) goto bb1

If(h0.b < h1.b) goto bb2

if (h0.b > h1.b) goto bb3

flag = (h0.c < h1.c)

goto if_end 

bb0: Flag = 1; Goto if_end

bb1: Flag = 0; Goto if_end

bb2: Flag = 1; Goto if_end

bb3: Flag = 0; Goto if_end

if_end: j = select flag, j+2, j


The trivial basic blocks bb0 - bb3 can cause huge performance penalty if the
comparison (h[j] < h[j+1]) is in a hot loop.  The IR code of if_end is like


  %cond = phi i1 [ false, . ], [ true, . ], [ false, . ], [ true, . ],
[%flag, .]

  %add = add i32 %j, 2

  %j.add = select i1 %cond.i, i32 %j, i32 %add


Every bool constant in the phi becomes a trivial basic block in the
assembly.  select is generated by SpeculativelyExecuteBB() of simplifyCFG
before function inlining and prevents jump-threading to further optimize the
CFG.  If I run the passes in the order of


-inline -instcombine -jump-threading -simplifycfg


then the problem is gone.  instcombine can clean the code after inlining so
that jump-threading can optimize the CFG.


Please let me know if you have any advice to solve this problem.


Thank you,




