[PATCH] propagate IR-level fast-math-flags to DAG nodes, disabled by default

Wed Aug 5 10:17:50 PDT 2015

> Is there really any disagreement here? It seems like someone needs to go
through and make sure that all relevant combines propagate the flags. All
other > options proposed were simply suggestions for short-term
workarounds, no?

Correct - I don't think there's any disagreement. Hacking CSE might provide
a bandage solution, but, as Escha noted, it would defeat the point of
pushing FMF into the DAG in the first place.

I can go through and fix everything in-tree, but I'm wondering:
1. Does every change need a regression test?
2. How do we alert out-of-tree backends about the change? Is a note to
llvm-dev good enough, or should there be an assert in the code if someone
creates one of the FMF-capable nodes without explicitly specifying the
flags?

On Wed, Aug 5, 2015 at 10:58 AM, Hal Finkel <hfinkel at anl.gov> wrote:

> ----- Original Message -----
> > From: "Sanjay Patel" <spatel at rotateright.com>
> > To: "escha" <escha at apple.com>, llvm-commits at lists.llvm.org
> > Sent: Wednesday, August 5, 2015 10:49:43 AM
> > Subject: Re: [PATCH] propagate IR-level fast-math-flags to DAG nodes,
> disabled by default
> >
> >
> > [Using new mailing list. Several of the previous mails in this thread
> > are probably lost forever for the lists; see below for context.]
> >
> > I don't think we should leave CSE in the DAG broken while figuring
> > out how to solve this,
>
> Is there really any disagreement here? It seems like someone needs to go
> through and make sure that all relevant combines propagate the flags. All
> other options proposed were simply suggestions for short-term workarounds,
> no?
>
>  -Hal
>
> > so I reverted r243687 with:
> > http://llvm.org/viewvc/llvm-project?view=revision&revision=244053
> >
> >
> > Also filed a bug using the example code below:
> > https://llvm.org/bugs/show_bug.cgi?id=24363
> >
> >
> > So we may get at least one in-tree regression test before the flag
> > gets enabled again. :)
> >
> >
> >
> > On Tue, Aug 4, 2015 at 5:39 PM, Sanjay Patel < spatel at rotateright.com
> > > wrote:
> >
> >
> >
> > Thanks - your example made it much clearer to me how this goes wrong.
> > I think I should revert r243687 and try to fix everything under the
> > cover of that scaffolding...
> >
> > I don't know how we'd test it all though. Here's an example that
> > shows the bug for one spot in DAGCombiner's GetNegatedExpression()
> > with an x86 target:
> >
> > $ cat badflags.ll
> > define float @fmf(float %a, float %b) {
> > %mul1 = fmul fast float %a, %b
> > %nega = fsub fast float 0.0, %a
> > %mul2 = fmul fast float %nega, %b
> > %abx2 = fsub fast float %mul1, %mul2
> > ret float %abx2
> > }
> >
> >
> > $ ./llc -o - badflags.ll -mattr=fma -enable-unsafe-fp-math
> > -enable-fmf-dag=0
> > ...
> > vmulss %xmm1, %xmm0, %xmm0
> > vaddss %xmm0, %xmm0, %xmm0
> > retq
> >
> > $ ./llc -o - badflags.ll -mattr=fma -enable-unsafe-fp-math
> > -enable-fmf-dag=1
> > ...
> > vmulss %xmm1, %xmm0, %xmm2
> > vfmadd213ss %xmm2, %xmm1, %xmm0 <--- failed to recognize that (a * b)
> > was already calculated
> > retq
> >
> >
> >
> >
> >
> >
> > On Tue, Aug 4, 2015 at 12:03 PM, escha < escha at apple.com > wrote:
> >
> >
> >
> > Unfortunately, no :/ though I can imagine one could be constructed in
> > *some* fashion, since surely crippled CSE is visible somehow
> > somewhere. The particular case that came up looked like this:
> >
> >
> > c = a * b FAST
> > d = -c FAST
> >
> >
> > (combine1 occurs)
> >
> >
> > c = a * b FAST
> > d = a * -b
> >
> >
> > then we have a combine in combine2, in our target, to deduplicate
> > these, which should result in this:
> >
> >
> >
> > c = a * b FAST
> > d = -c FAST
> >
> >
> > which we do because negates are free (modifiers on a GPU) while an
> > extra multiply is not.
> >
> >
> > it visits d = a * -b and then searches the DAG to see if (a * b)
> > exists. But it doesn’t, only (a * b FAST) exists, the search fails,
> > and the code is pessimized.
> >
> >
> > —escha
> >
> >
> >
> >
> >
> >
> >
> > On Aug 4, 2015, at 10:57 AM, Sanjay Patel < spatel at rotateright.com >
> > wrote:
> >
> >
> > Thanks! Do you have a test case that would show a difference from
> > that kind of change on an in-tree backend?
> >
> >
> >
> >
> > On Tue, Aug 4, 2015 at 11:34 AM, escha < escha at apple.com > wrote:
> >
> >
> >
> > I’ve only done the following patch, just as an experiment; I haven’t
> > done much work on it overall (I was just experimenting with trying
> > to kill off this individual regression).
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > —escha
> >
> >
> >
> >
> >
> > On Aug 4, 2015, at 10:07 AM, Sanjay Patel < spatel at rotateright.com >
> > wrote:
> >
> >
> >
> > Ok, that's good news. The other good news is that nobody has reported
> > crashes from this change...yet.
> >
> > I'm fine proceeding either way (revert r243687 or just start patching
> > as-is). Let me know what you'd prefer.
> >
> > Can you commit any patches you've already done for the generic DAG
> > classes?
> >
> >
> >
> >
> >
> >
> > On Tue, Aug 4, 2015 at 10:47 AM, escha < escha at apple.com > wrote:
> >
> >
> >
> > I have a patch to add propagation to our own backend, by the way; it
> > wasn’t at all difficult to write (it took like 20 minutes). But if
> > we keep the patch in, we should probably put out an announcement to
> > backend maintainers that they need to add fast-math flag
> > propagation.
> >
> >
> > —escha
> >
> >
> >
> >
> >
> >
> >
> > On Aug 4, 2015, at 8:28 AM, escha < escha at apple.com > wrote:
> >
> >
> >
> > The “unsafe FP math” transforms aren’t enough; the problem is that
> > even the *safe-math* transforms need to propagate flags from their
> > source nodes, or you end up with un-CSEable duplicates (and of
> > course, lose flags for future transformations!). You need to pass
> > flags in for *every single creation of an FADD, FSUB, FADD, or
> > FDIV*, as far as I can tell. This involves a huge amount of diffs
> > that look like this:
> >
> >
> >
> > // fold (fneg (fsub A, B)) -> (fsub B, A)
> > return DAG.getNode(ISD::FSUB, SDLoc(Op), Op.getValueType(),
> > - Op.getOperand(1), Op.getOperand(0));
> > + Op.getOperand(1), Op.getOperand(0),
> > + &cast<BinaryWithFlagsSDNode>(Op)->Flags);
> >
> >
> > One idea I had was to allow conservative CSE. That is, allow CSEing
> > nodes with fast-math flags, even if their flags differ — just, when
> > CSEing them, pick the intersection of the two flags. This will
> > guarantee that CSE will still function, even if DAG folds don’t
> > properly maintain fast-math flags. But we still need to eventually
> > add all these flag propagations; otherwise having fast-math flags in
> > the DAG is useless at best, isn’t it?
> >
> >
> > —escha
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Aug 4, 2015, at 7:26 AM, Sanjay Patel < spatel at rotateright.com >
> > wrote:
> >
> >
> >
> >
> > Does anyone see a workaround to do this piecemeal with FMF enabled?
> >
> > Otherwise, I see about 30 checks for UnsafeFPMath transforms in
> > DAGCombiner...so I could revert r243687, start working on the
> > combiner fixes, and then turn the flag back on when DAGCombiner is
> > FMF-clean.
> >
> > But this wouldn't be perfect if you have more UnsafeFPMath transforms
> > in your backend. You'd have to patch those too before we turn the
> > flag back on.
> >
> > Any other ideas about how to proceed?
> >
> >
> >
> > On Mon, Aug 3, 2015 at 4:52 PM, escha < escha at apple.com > wrote:
> >
> >
> >
> > We’re getting significant CSE regressions on our backend, caused by
> > the fact that none of the DAG folds propagate fast-math flags. This
> > results in nodes not being CSE’d, since we only CSE nodes if they
> > have identical flags (and thus worse codegen).
> >
> >
> > I was able to get rid of many of them by adding flag propagation to
> > FNEG folding, but there’s way more than just that.
> >
> >
> >
> >
> > —escha
> >
> >
> >
> >
> >
> >
> >
> > On Jul 29, 2015, at 11:51 AM, Sanjay Patel < spatel at rotateright.com >
> > wrote:
> >
> >
> >
> >
> > For reference, fixes for PR24141 are in:
> > http://reviews.llvm.org/rL243293 http://reviews.llvm.org/rL243498
> > http://reviews.llvm.org/rL243500 ...assuming those don't cause new
> > breakage, it's finally, really time to flip the FMF flag...and wait
> > for more breakage.
> >
> >
> >
> > On Mon, Jul 20, 2015 at 11:34 AM, Sanjay Patel <
> > spatel at rotateright.com > wrote:
> >
> >
> >
> >
> >
> >
> >
> > On Fri, Jul 17, 2015 at 4:42 PM, Nick Lewycky < nlewycky at google.com >
> > wrote:
> >
> >
> >
> >
> >
> > On 9 July 2015 at 15:27, Sanjay Patel < spatel at rotateright.com >
> > wrote:
> >
> >
> >
> >
> > Hi Nick,
> >
> > The reciprocal fix is in at r241826. Please let me know when you find
> > another failure. Thanks!
> >
> >
> >
> > Thank you! I tested it and found exactly nothing wrong. Hopefully I
> > didn't goof the testing, but I think it's time to flip the flag to
> > true, then wait a few days before removing the flag entirely.
> >
> >
> >
> > Wow...I'm surprised, but I'll hope for the best. Thanks for the
> > testing!
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Note however, that we still have problems with PR24141 which is
> > related to unsafe fp math, but doesn't change with the
> > -enable-fmf-dag flag.
> >
> > Proposed fix:
> > http://reviews.llvm.org/D11345
> >
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >
> >
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
> >
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150805/eaaa5a1a/attachment-0001.html>