[PATCH/RFC] New TLI option for fast selects

Tue May 5 14:46:11 PDT 2015

----- Original Message -----
> From: "Eric Christopher" <echristo at gmail.com>
> To: "escha" <escha at apple.com>, "llvm-commits" <llvm-commits at cs.uiuc.edu>
> Sent: Monday, May 4, 2015 5:40:50 PM
> Subject: Re: [PATCH/RFC] New TLI option for fast selects
> 
> 
> 
> This seems fairly reasonable, couple of nits:
> 
> 
> a) Routine name: theoretically it should begin with a lower case
> letter. I know it probably doesn't match anything around it then. I
> don't know what we want to do about this, but I wouldn't complain
> much.
> 
> 
> b) Argument names: Can you make them a little more descriptive and
> document them?
> 
> 
> c) Got an in-tree user where this would be useful?
> 

Yes; on the PPC A2, integer selects are fast.

 -Hal

> 
> -eric
> 
> On Thu, Apr 30, 2015 at 1:47 PM escha < escha at apple.com > wrote:
> 
> 
> There’s a number of DAG transforms that are suboptimal on
> architectures with extremely fast selects, e.g. those with actual
> select instructions mapping relatively cleanly to select_cc. An
> example would be integer abs(): on an ideal architecture without a
> select it might be three instructions (cmp, sub, cmov), so the
> three-operation canonicalization likely won’t hurt even in the worst
> case. But on an architecture with a select, we’re going from 2
> instructions to 3, which significantly increases instruction count,
> and it’s difficult to “go back” from the new instruction sequence to
> the select.
> 
> I’m not sure this patch actually caught all of them; there might be
> others, since I didn’t check them all. My logic here was to put a
> check on every transform which creates more nodes than it consumes
> in order to eliminate a select. On an out of tree target this saves
> a number of instructions (with no regressions on any test) by making
> the included TLI return “false” for that target.
> 
> This could also open up more optimizations in the future that assume
> selects are fast, e.g. one select DAG node is roughly equivalent to
> one real instruction. I wonder if any of the in-tree GPU backends
> would find something like this useful?
> 
> Any thoughts on the implementation?
> 
> — escha
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory