[PATCH] Flag to enable IEEE-754 friendly FP optimizations

Thu Aug 27 19:47:17 PDT 2015

----- Original Message -----
> From: "Sergey Dmitrouk" <sdmitrouk at accesssoftek.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "Owen Anderson" <owen at apple.com>, "llvm-commits" <llvm-commits at lists.llvm.org>
> Sent: Thursday, August 20, 2015 10:54:27 AM
> Subject: Re: [PATCH] Flag to enable IEEE-754 friendly FP optimizations
> 
> Hi again,
> 
> On Mon, Aug 17, 2015 at 08:19:58PM +0300, Sergey Dmitrouk wrote:
> > Sorry, didn't have enough time to complete this before vacation.
> >  The reason
> > it required more time is that unfortunate instruction reordering
> > when
> > floating-point operation is moved below function invocation.  You
> > mentioned
> > it in the winter, but I couldn't reproduce the issue no matter how
> > hard I tried
> > and thought that it can't happen.  After rebase three weeks ago it
> > occurred for the first time allowing me to make a test for it, but
> > defining
> > operation to read/write from memory didn't fix the issue.  I left
> > it on
> > trying to find more checks that should block undesired reordering
> > during
> > instruction selection, will get back to it.
> 
> I managed to fix reordering issue in a proof-of-concept way by adding
> chain
> to "fadd" instruction.  As you might imagine, this has quite big
> impact on
> everything else and I even unable to tell how many tests it breaks (a
> lot,
> some are killed, some don't finish).
> 
> I started fixing that and it requires updating every creation of
> nodes for
> FADD, FSUB, etc. and all related permutations (mainly in
> DAGCombiner).  That
> is like hundreds of cases, not that many overall.  Yet it seems like
> a big
> change and I'm wondering whether this is how one should do it?  Is
> there any
> simpler and less intrusive way of guaranteeing relative order of
> instructions
> than adding (effectively optional) chain to floating-point
> instructions?

I think you might as well introduce new SDAG node types, FADD_W_CHAIN, etc. We're essentially not going to optimize them anyway, so I'm not worried about losing existing DAGCombine optimizations. The tricky part is that you need to instruction select these nodes into instructions that have side effects at the MI level, and this probably requires modifying the backends (it needs to have variants of the existing FP instructions marked with side effects). But this is probably unavoidable, and luckily, most of the changes seem largely rote.

 -Hal

> 
> Thanks,
> Sergey
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory