[PATCH] D26855: New unsafe-fp-math implementation for X86 target

Thu Jan 26 08:29:56 PST 2017

hfinkel added a comment.

In https://reviews.llvm.org/D26855#657577, @spatel wrote:

> In https://reviews.llvm.org/D26855#657464, @RKSimon wrote:
>
> > In https://reviews.llvm.org/D26855#657287, @avt77 wrote:
> >
> > > Hi All, 
> > >  I found a really "stress" test for div operations (see the attachment)F3025626: spill_fdiv.ll <https://reviews.llvm.org/F3025626> (tnx to Sanjay Patel). The test shows maybe the worst case of the possible degradation because of this patch. I used the following command with 2 different compilers:
> > >
> > > .......
> > >
> > > For "pure" trunk compiler I got:   Elapsed time 2 s.
> > >  For compiler with patch I got:      Elapsed time 18 s.
> >
> >
> > Do you have any profiling info on where the time is going please? @Gerolf might then be able to advise whether we can improve the MCCombiner mechanism before/after this patch has gone in.
>
>
> I'll jump in here because I supplied this (hopefully degenerate and worst) case based on my earlier reassociation transforms for MachineCombiner (see https://reviews.llvm.org/D10321 where I first mentioned the potential compile-time problem). When I looked into that, the time was all going into MachineTraceMetrics::Ensemble::computeInstrDepths() and MachineTraceMetrics::Ensemble::
>  computeInstrHeights(). Those got very expensive for long sequences of instructions. Possible fixes would be to improve how those are computed, cache those results, and/or eliminate how often we ask for those values.

Certainly seems worthwhile exploring whether those can be cached (if I understand what they're doing, we do essentially cache very-similar values during MI scheduling). This worst-case hit is definitely undesirable, and we can certainly run into lots of machine-generated straight-line code, so hitting these kinds of cases in the wild is not unthinkable.

> We were ok with some additional potential compile-time cost because reassociation opportunities do not appear to be that common and were limited to -ffast-math compiles . We can make similar arguments for the recip transforms in this patch?
> 
> But it is worth noting that since the time of https://reviews.llvm.org/D10321, the number of reassociation candidate opcodes that x86 matches has grown to ~200 (X86InstrInfo::isAssociativeAndCommutative()) and includes integer ops. We're probably overdue on measuring and improving the perf of MachineCombiner.

https://reviews.llvm.org/D26855