RFC: Tweak heuristics in SimplifyCFG

Tue Feb 10 11:00:59 PST 2015

----- Original Message -----
> From: "Geoff Berry" <gberry at codeaurora.org>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "LLVM Commits" <llvm-commits at cs.uiuc.edu>, "James Molloy" <james at jamesmolloy.co.uk>
> Sent: Monday, February 9, 2015 10:28:38 AM
> Subject: RE: RFC: Tweak heuristics in SimplifyCFG
> 
> Hi James, Hal,
> 
> James, to your point, CodeGenPrepare does convert selects to
> branches, but only in very specific cases (see
> isFormingBranchFromSelectProfitable()), and it only considers
> selects in isolation.  For example, it doesn't consider groups of
> selects that use the same predicate value that can all be converted
> to use one set of branches.  It also doesn't consider how much code
> could be sunk into the newly created blocks and therefore executed
> potentially less frequently.

I understand, and thanks for the example. We certainly need to do a better job at this.

I think that if we're going to have a kind of target-influenced canonical form here, which seems reasonable, then the branch-vs-select cost will probably need to be target-provided as well. Just using a fixed pseudo-branch cost (1 currently, or 2 with the patch) seems unlikely to be sufficient (although I could be wrong). But, if can decide what target information is necessary for SimplifyCFG to make a reasonable decision, and feed that though TTI, we can probably make all of this self-consistent. Maybe all we need here is some kind of target-provided 'branch cost'?

 -Hal

> 
> Hal, here's an example of what I'm talking about.  The function
> selects() below will end up using selects and doing all three
> divides always, whereas the function branches(), which is
> equivalent, will use branches and only execute the divides in the
> else path.
> 
> int selects(int a, int b, int c, int d)
> {
>     int x1, x2, x3;
> 
>     x1 = a / b;
>     x2 = b / c;
>     x3 = c / d;
>     if (a < b) {
>         x1 = 0;
>         x2 = 0;
>         x3 = 0;
>     }
> 
>     return x1 + x2 + x3;
> }
> 
> *** IR Dump After CodeGen Prepare ***
> ; Function Attrs: nounwind readnone
> define i32 @selects(i32 %a, i32 %b, i32 %c, i32 %d) #0 {
> entry:
>   %div = sdiv i32 %a, %b
>   %div1 = sdiv i32 %b, %c
>   %div2 = sdiv i32 %c, %d
>   %cmp = icmp slt i32 %a, %b
>   %x1.0 = select i1 %cmp, i32 0, i32 %div
>   %x2.0 = select i1 %cmp, i32 0, i32 %div1
>   %x3.0 = select i1 %cmp, i32 0, i32 %div2
>   %add = add nsw i32 %x2.0, %x1.0
>   %add3 = add nsw i32 %add, %x3.0
>   ret i32 %add3
> }
> 
> int branches(int a, int b, int c, int d)
> {
>     int x1, x2, x3;
> 
>     if (a < b) {
>         x1 = 0;
>         x2 = 0;
>         x3 = 0;
>     } else {
>         x1 = a / b;
>         x2 = b / c;
>         x3 = c / d;
>     }
> 
>     return x1 + x2 + x3;
> }
> 
> *** IR Dump After CodeGen Prepare ***
> ; Function Attrs: nounwind readnone
> define i32 @branches(i32 %a, i32 %b, i32 %c, i32 %d) #0 {
> entry:
>   %cmp = icmp slt i32 %a, %b
>   br i1 %cmp, label %if.end, label %if.else
> 
> if.else:                                          ; preds = %entry
>   %div = sdiv i32 %a, %b
>   %div1 = sdiv i32 %b, %c
>   %div2 = sdiv i32 %c, %d
>   br label %if.end
> 
> if.end:                                           ; preds = %entry,
> %if.else
>   %div10 = phi i32 [ 0, %entry ], [ %div, %if.else ]
>   %div19 = phi i32 [ 0, %entry ], [ %div1, %if.else ]
>   %div28 = phi i32 [ 0, %entry ], [ %div2, %if.else ]
>   %add = add nsw i32 %div19, %div10
>   %add3 = add nsw i32 %add, %div28
>   ret i32 %add3
> }
> 
> --
> Geoff Berry
> Employee of Qualcomm Innovation Center, Inc.
>  Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a
>  Linux Foundation Collaborative Project
> 
> -----Original Message-----
> From: Hal Finkel [mailto:hfinkel at anl.gov]
> Sent: Friday, February 06, 2015 5:44 PM
> To: Geoff Berry
> Cc: LLVM Commits; James Molloy
> Subject: Re: RFC: Tweak heuristics in SimplifyCFG
> 
> ----- Original Message -----
> > From: "Geoff Berry" <gberry at codeaurora.org>
> > To: "James Molloy" <james at jamesmolloy.co.uk>, "Hal Finkel"
> > <hfinkel at anl.gov>
> > Cc: "LLVM Commits" <llvm-commits at cs.uiuc.edu>
> > Sent: Friday, February 6, 2015 4:38:27 PM
> > Subject: RE: RFC: Tweak heuristics in SimplifyCFG
> > 
> > 
> > 
> > 
> > Hi James, Hal,
> > 
> > 
> > 
> > I’ve been looking into a related issue, and would appreciate your
> > input on it. We currently convert branches to selects in
> > SimplifyCFG,
> > but we don’t do the opposite anywhere at the IR level from what I
> > can
> > tell.
> > It seems like it might be beneficial to do this conversion in cases
> > where the un-if converted code would be rejected by the cost
> > heuristic
> > that was being discussed below. Code sinking also comes into play,
> > since ideally we would consider the branching case cost with code
> > sunk
> > into the then/else blocks where possible/profitable. Come to think
> > of
> > it, it seems like we would want to do some sinking before the
> > SimplifyCFG case as well to make a better decision.
> > 
> > 
> > 
> > Any thoughts on this?
> > 
> 
> I think that I understand what you're saying, but could you provide a
> couple quick examples? I'm somewhat skeptical because creating
> branches from selects generically makes optimization and analysis
> harder. As a result, we generally delay this until CodeGen when our
> modeling is more precise.
> 
>  -Hal
> 
> > 
> > 
> > Thanks,
> > 
> > 
> > 
> > 
> > --
> > 
> > Geoff Berry
> > 
> > Employee of Qualcomm Innovation Center, Inc.
> > 
> > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> > a
> > Linux Foundation Collaborative Project
> > 
> > 
> > 
> > From: llvm-commits-bounces at cs.uiuc.edu
> > [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of James Molloy
> > Sent: Friday, February 06, 2015 12:01 PM
> > To: Hal Finkel
> > Cc: LLVM Commits
> > Subject: Re: RFC: Tweak heuristics in SimplifyCFG
> > 
> > 
> > 
> > 
> > Hi Hal,
> > 
> > 
> > That's a really good point, I'm on board with that. I'll cook up a
> > patch soon and send it for review.
> > 
> > 
> > 
> > 
> > 
> > Cheers,
> > 
> > 
> > 
> > 
> > 
> > James
> > 
> > 
> > 
> > 
> > On Fri Feb 06 2015 at 4:53:44 PM Hal Finkel < hfinkel at anl.gov >
> > wrote:
> > 
> > 
> > 
> > ----- Original Message -----
> > > From: "James Molloy" < james at jamesmolloy.co.uk >
> > > To: "LLVM Commits" < llvm-commits at cs.uiuc.edu >
> > > Sent: Friday, February 6, 2015 9:00:33 AM
> > > Subject: RFC: Tweak heuristics in SimplifyCFG
> > > 
> > > Hi all,
> > > 
> > > 
> > > I've been looking at why we generate poor code for idiomatic
> > > stuff
> > > like clamp() and abs().
> > > 
> > > 
> > > Clamp normally looks like this:
> > > 
> > > 
> > > T clamp(T a, T b, T c) { return (a < b) ? b : ((a > c) ? c : a);
> > > }
> > > 
> > > 
> > > We currently produce the following IR for this:
> > > 
> > > 
> > > define i32 @clamp2(i32 %a, i32 %b, i32 %c) #0 {
> > > entry:
> > > %cmp = icmp sgt i32 %a, %c
> > > br i1 %cmp, label %cond.end4, label %cond.false
> > > 
> > > 
> > > cond.false:
> > > %cmp1 = icmp slt i32 %a, %b
> > > %cond = select i1 %cmp1, i32 %b, i32 %a br label %cond.end4
> > > 
> > > 
> > > cond.end4:
> > > %cond5 = phi i32 [ %cond, %cond.false ], [ %c, %entry ] ret i32
> > > %cond5 }
> > > 
> > > 
> > > This is multi-block so makes later optimizations more awkward,
> > > such
> > > as loop vectorization and loop rerolling. SimplifyCFG can convert
> > > this into "icmp; select; icmp; select", but doesn't because it
> > > has
> > > quite a conservative heuristic - it'll only ever hoist one
> > > (cheap)
> > > instruction into the dominating block.
> > > 
> > > 
> > > I think this is too conservative - given the potential gains
> > > later
> > > on in the optimizer from flattening basic blocks (and that
> > > CodegenPrepare can remove selects again!) - we should be more
> > > aggressive here.
> > > 
> > > 
> > > My suggestions are:
> > > - Up -phi-node-folding-threshold from 1 to 3.
> > > - Add "fcmp", "fadd" and "fsub" to the list of cheap instructions
> > > to
> > > hoist. (fadd and fsub to make abs() work!)
> > > 
> > > Would anyone object to this? I'll have benchmark results on
> > > AArch64
> > > by the end of the weekend.
> > 
> > This sounds good to be. Regarding the second point, I'd rather that
> > SimplifyCFG did not have its own list of cheap instructions (I'm
> > referring to ComputeSpeculationCost in
> > lib/Transforms/Utils/SimplifyCFG.cpp), but rather used the existing
> > TTI interface for this. SimplifyCFG already now uses TTI for other
> > things, and I think this is a natural enhancement.
> > 
> > I think that we should call TTI.getUserCost(&I) (which is the same
> > interface used by the inliner's cost analysis, the loop unroller,
> > etc.), and hoist an unlimited number of instructions marked as
> > TargetTransformInfo::TCC_Free and some limited number of
> > instructions
> > marked as TCC_Basic. The idea is that the total cost of the
> > instructions should equal (phi-node-folding-threshold)*(TCC_Basic).
> > 
> > This also provides a natural way to turn off these optimizations
> > for
> > fadd, etc. on targets that don't have hardware-implemented floating
> > point.
> > 
> > -Hal
> > 
> > > 
> > > 
> > > Cheers,
> > > 
> > > 
> > > James
> > > 
> > > 
> > > _______________________________________________
> > > llvm-commits mailing list
> > > llvm-commits at cs.uiuc.edu
> > > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> > > 
> > 
> > --
> > Hal Finkel
> > Assistant Computational Scientist
> > Leadership Computing Facility
> > Argonne National Laboratory
> 
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> 
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory