[PATCH] "float2int": Add a new pass to demote from float to int where possible.

Wed Mar 18 19:54:15 PDT 2015

----- Original Message -----

> From: "Sean Silva" <chisophugis at gmail.com>
> To: reviews+D7790+public+42b40e367a6aa137 at reviews.llvm.org
> Cc: "James Molloy" <james.molloy at arm.com>, "Hal Finkel"
> <hfinkel at anl.gov>, llvm-commits at cs.uiuc.edu
> Sent: Wednesday, March 18, 2015 7:18:44 PM
> Subject: Re: [PATCH] "float2int": Add a new pass to demote from float
> to int where possible.

> On Thu, Mar 5, 2015 at 7:36 AM, hfinkel at anl.gov < hfinkel at anl.gov >
> wrote:

> > In http://reviews.llvm.org/D7790#134758 , @mkuper wrote:
> 

> > > Hi James,
> 
> > >
> 
> > > Just so we have a record of what we talked about on IRC (and can
> > > give Hal a chance to disagree :-)
> 

> > Good; I disagree :-)
> 

> > The first question is answer is: What is the most useful and
> > reasonable canonical form? The reason I support running this pass
> > early in the pipeline is because I believe that demoting these int
> > -> fp -> int sequences to int sequences, when semantically
> > equivalent, is the most useful canonical form.
> 

> > If it is useful, because of microarchitectural features, to use FP
> > vector ops instead of integer vector ops, then that should be
> > 'actively' handled later (instead of just taking advantage of it
> > when it happens to happen).
> 

> Actually, the opposite transformation might be useful in any backend
> and is not limited to vector ops. Currently, extremely integer-heavy
> workloads (there are many applications that fall into this category,
> e.g. LLVM itself) end up leaving all the floating point units idle
> regardless of architecture. So it's just a matter of the relative
> domain-crossing costs vs. the extra ILP due to having more execution
> resources.
I think that the tough part of the modeling here is actually the instruction latency. In situations where you have enough ILP, this does not matter too much, but floating-point operations often have higher latency than the integer ones. So you need to make sure you have enough spare ILP to cover the increased latency. 

-Hal 

> On architectures like x86 that have a memory-->register int to FP
> conversion instruction, some of the domain crossing cost can be
> avoided. A cursory look at the wikipedia page for POWER8 indicates
> that the core has 2x integer units, but 7 other units that can do
> basic arithmetic (4x FPU, 2x VMX, 1x Decimal FP).

> -- Sean Silva

> > So I think that this should run early by default, x86 included. We
> > should also reverse the transformation later, perhaps within the
> > vectorizer, using an actual cost model, if that proves useful.
> 

> > ).
> 

> > > On x86, vector i64 muls can be much worse than vector double
> > > muls.
> > > Since this is pre-LoopV, and we don't know if we'll end up with
> > > vector or scalar code, I think the safe thing to do on x86 would
> > > be to disable this for cases where we'll do a double -> i64
> > > transformation.
> 

> > >
> 

> > > This means we should probably have a target hook for that that
> > > x86
> > > can override.
> 

> > REPOSITORY
> 
> > rL LLVM
> 

> > http://reviews.llvm.org/D7790
> 

> > EMAIL PREFERENCES
> 
> > http://reviews.llvm.org/settings/panel/emailpreferences/
> 

> > _______________________________________________
> 
> > llvm-commits mailing list
> 
> > llvm-commits at cs.uiuc.edu
> 
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 

-- 

Hal Finkel 
Assistant Computational Scientist 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150318/0ccfbfd2/attachment.html>