[PATCH/RFC] New TLI option for fast selects

Tue May 5 14:53:39 PDT 2015

On Tue, May 5, 2015 at 2:46 PM Hal Finkel <hfinkel at anl.gov> wrote:

> ----- Original Message -----
> > From: "Eric Christopher" <echristo at gmail.com>
> > To: "escha" <escha at apple.com>, "llvm-commits" <llvm-commits at cs.uiuc.edu>
> > Sent: Monday, May 4, 2015 5:40:50 PM
> > Subject: Re: [PATCH/RFC] New TLI option for fast selects
> >
> >
> >
> > This seems fairly reasonable, couple of nits:
> >
> >
> > a) Routine name: theoretically it should begin with a lower case
> > letter. I know it probably doesn't match anything around it then. I
> > don't know what we want to do about this, but I wouldn't complain
> > much.
> >
> >
> > b) Argument names: Can you make them a little more descriptive and
> > document them?
> >
> >
> > c) Got an in-tree user where this would be useful?
> >
>
> Yes; on the PPC A2, integer selects are fast.
>
>
Sweet. Thanks!

-eric

>  -Hal
>
> >
> > -eric
> >
> > On Thu, Apr 30, 2015 at 1:47 PM escha < escha at apple.com > wrote:
> >
> >
> > There’s a number of DAG transforms that are suboptimal on
> > architectures with extremely fast selects, e.g. those with actual
> > select instructions mapping relatively cleanly to select_cc. An
> > example would be integer abs(): on an ideal architecture without a
> > select it might be three instructions (cmp, sub, cmov), so the
> > three-operation canonicalization likely won’t hurt even in the worst
> > case. But on an architecture with a select, we’re going from 2
> > instructions to 3, which significantly increases instruction count,
> > and it’s difficult to “go back” from the new instruction sequence to
> > the select.
> >
> > I’m not sure this patch actually caught all of them; there might be
> > others, since I didn’t check them all. My logic here was to put a
> > check on every transform which creates more nodes than it consumes
> > in order to eliminate a select. On an out of tree target this saves
> > a number of instructions (with no regressions on any test) by making
> > the included TLI return “false” for that target.
> >
> > This could also open up more optimizations in the future that assume
> > selects are fast, e.g. one select DAG node is roughly equivalent to
> > one real instruction. I wonder if any of the in-tree GPU backends
> > would find something like this useful?
> >
> > Any thoughts on the implementation?
> >
> > — escha
> >
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150505/e61667d2/attachment.html>