RFC: Tweak heuristics in SimplifyCFG

Mon Feb 9 08:28:38 PST 2015

Hi James, Hal,

James, to your point, CodeGenPrepare does convert selects to branches, but only in very specific cases (see isFormingBranchFromSelectProfitable()), and it only considers selects in isolation.  For example, it doesn't consider groups of selects that use the same predicate value that can all be converted to use one set of branches.  It also doesn't consider how much code could be sunk into the newly created blocks and therefore executed potentially less frequently.

Hal, here's an example of what I'm talking about.  The function selects() below will end up using selects and doing all three divides always, whereas the function branches(), which is equivalent, will use branches and only execute the divides in the else path.

int selects(int a, int b, int c, int d)
{
    int x1, x2, x3;

    x1 = a / b;
    x2 = b / c;
    x3 = c / d;
    if (a < b) {
        x1 = 0;
        x2 = 0;
        x3 = 0;
    }

    return x1 + x2 + x3;
}

*** IR Dump After CodeGen Prepare ***
; Function Attrs: nounwind readnone
define i32 @selects(i32 %a, i32 %b, i32 %c, i32 %d) #0 {
entry:
  %div = sdiv i32 %a, %b
  %div1 = sdiv i32 %b, %c
  %div2 = sdiv i32 %c, %d
  %cmp = icmp slt i32 %a, %b
  %x1.0 = select i1 %cmp, i32 0, i32 %div
  %x2.0 = select i1 %cmp, i32 0, i32 %div1
  %x3.0 = select i1 %cmp, i32 0, i32 %div2
  %add = add nsw i32 %x2.0, %x1.0
  %add3 = add nsw i32 %add, %x3.0
  ret i32 %add3
}

int branches(int a, int b, int c, int d)
{
    int x1, x2, x3;

    if (a < b) {
        x1 = 0;
        x2 = 0;
        x3 = 0;
    } else {
        x1 = a / b;
        x2 = b / c;
        x3 = c / d;
    }

    return x1 + x2 + x3;
}

*** IR Dump After CodeGen Prepare ***
; Function Attrs: nounwind readnone
define i32 @branches(i32 %a, i32 %b, i32 %c, i32 %d) #0 {
entry:
  %cmp = icmp slt i32 %a, %b
  br i1 %cmp, label %if.end, label %if.else

if.else:                                          ; preds = %entry
  %div = sdiv i32 %a, %b
  %div1 = sdiv i32 %b, %c
  %div2 = sdiv i32 %c, %d
  br label %if.end

if.end:                                           ; preds = %entry, %if.else
  %div10 = phi i32 [ 0, %entry ], [ %div, %if.else ]
  %div19 = phi i32 [ 0, %entry ], [ %div1, %if.else ]
  %div28 = phi i32 [ 0, %entry ], [ %div2, %if.else ]
  %add = add nsw i32 %div19, %div10
  %add3 = add nsw i32 %add, %div28
  ret i32 %add3
}

--
Geoff Berry
Employee of Qualcomm Innovation Center, Inc.
 Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

-----Original Message-----
From: Hal Finkel [mailto:hfinkel at anl.gov] 
Sent: Friday, February 06, 2015 5:44 PM
To: Geoff Berry
Cc: LLVM Commits; James Molloy
Subject: Re: RFC: Tweak heuristics in SimplifyCFG

----- Original Message -----
> From: "Geoff Berry" <gberry at codeaurora.org>
> To: "James Molloy" <james at jamesmolloy.co.uk>, "Hal Finkel" 
> <hfinkel at anl.gov>
> Cc: "LLVM Commits" <llvm-commits at cs.uiuc.edu>
> Sent: Friday, February 6, 2015 4:38:27 PM
> Subject: RE: RFC: Tweak heuristics in SimplifyCFG
> 
> 
> 
> 
> Hi James, Hal,
> 
> 
> 
> I’ve been looking into a related issue, and would appreciate your 
> input on it. We currently convert branches to selects in SimplifyCFG, 
> but we don’t do the opposite anywhere at the IR level from what I can 
> tell.
> It seems like it might be beneficial to do this conversion in cases 
> where the un-if converted code would be rejected by the cost heuristic 
> that was being discussed below. Code sinking also comes into play, 
> since ideally we would consider the branching case cost with code sunk 
> into the then/else blocks where possible/profitable. Come to think of 
> it, it seems like we would want to do some sinking before the 
> SimplifyCFG case as well to make a better decision.
> 
> 
> 
> Any thoughts on this?
> 

I think that I understand what you're saying, but could you provide a couple quick examples? I'm somewhat skeptical because creating branches from selects generically makes optimization and analysis harder. As a result, we generally delay this until CodeGen when our modeling is more precise.

 -Hal

> 
> 
> Thanks,
> 
> 
> 
> 
> --
> 
> Geoff Berry
> 
> Employee of Qualcomm Innovation Center, Inc.
> 
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a 
> Linux Foundation Collaborative Project
> 
> 
> 
> From: llvm-commits-bounces at cs.uiuc.edu 
> [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of James Molloy
> Sent: Friday, February 06, 2015 12:01 PM
> To: Hal Finkel
> Cc: LLVM Commits
> Subject: Re: RFC: Tweak heuristics in SimplifyCFG
> 
> 
> 
> 
> Hi Hal,
> 
> 
> That's a really good point, I'm on board with that. I'll cook up a 
> patch soon and send it for review.
> 
> 
> 
> 
> 
> Cheers,
> 
> 
> 
> 
> 
> James
> 
> 
> 
> 
> On Fri Feb 06 2015 at 4:53:44 PM Hal Finkel < hfinkel at anl.gov >
> wrote:
> 
> 
> 
> ----- Original Message -----
> > From: "James Molloy" < james at jamesmolloy.co.uk >
> > To: "LLVM Commits" < llvm-commits at cs.uiuc.edu >
> > Sent: Friday, February 6, 2015 9:00:33 AM
> > Subject: RFC: Tweak heuristics in SimplifyCFG
> > 
> > Hi all,
> > 
> > 
> > I've been looking at why we generate poor code for idiomatic stuff 
> > like clamp() and abs().
> > 
> > 
> > Clamp normally looks like this:
> > 
> > 
> > T clamp(T a, T b, T c) { return (a < b) ? b : ((a > c) ? c : a); }
> > 
> > 
> > We currently produce the following IR for this:
> > 
> > 
> > define i32 @clamp2(i32 %a, i32 %b, i32 %c) #0 {
> > entry:
> > %cmp = icmp sgt i32 %a, %c
> > br i1 %cmp, label %cond.end4, label %cond.false
> > 
> > 
> > cond.false:
> > %cmp1 = icmp slt i32 %a, %b
> > %cond = select i1 %cmp1, i32 %b, i32 %a br label %cond.end4
> > 
> > 
> > cond.end4:
> > %cond5 = phi i32 [ %cond, %cond.false ], [ %c, %entry ] ret i32 
> > %cond5 }
> > 
> > 
> > This is multi-block so makes later optimizations more awkward, such 
> > as loop vectorization and loop rerolling. SimplifyCFG can convert 
> > this into "icmp; select; icmp; select", but doesn't because it has 
> > quite a conservative heuristic - it'll only ever hoist one (cheap) 
> > instruction into the dominating block.
> > 
> > 
> > I think this is too conservative - given the potential gains later 
> > on in the optimizer from flattening basic blocks (and that 
> > CodegenPrepare can remove selects again!) - we should be more 
> > aggressive here.
> > 
> > 
> > My suggestions are:
> > - Up -phi-node-folding-threshold from 1 to 3.
> > - Add "fcmp", "fadd" and "fsub" to the list of cheap instructions to 
> > hoist. (fadd and fsub to make abs() work!)
> > 
> > Would anyone object to this? I'll have benchmark results on AArch64 
> > by the end of the weekend.
> 
> This sounds good to be. Regarding the second point, I'd rather that 
> SimplifyCFG did not have its own list of cheap instructions (I'm 
> referring to ComputeSpeculationCost in 
> lib/Transforms/Utils/SimplifyCFG.cpp), but rather used the existing 
> TTI interface for this. SimplifyCFG already now uses TTI for other 
> things, and I think this is a natural enhancement.
> 
> I think that we should call TTI.getUserCost(&I) (which is the same 
> interface used by the inliner's cost analysis, the loop unroller, 
> etc.), and hoist an unlimited number of instructions marked as 
> TargetTransformInfo::TCC_Free and some limited number of instructions 
> marked as TCC_Basic. The idea is that the total cost of the 
> instructions should equal (phi-node-folding-threshold)*(TCC_Basic).
> 
> This also provides a natural way to turn off these optimizations for 
> fadd, etc. on targets that don't have hardware-implemented floating 
> point.
> 
> -Hal
> 
> > 
> > 
> > Cheers,
> > 
> > 
> > James
> > 
> > 
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> > 
> 
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory

--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory