<div dir="ltr"><div><div><div><div>> Is there really any disagreement here? It seems like someone needs to go

 through and make sure that all relevant combines propagate the flags. 

All other > options proposed were simply suggestions for short-term 

workarounds, no?<br>

<br></div>Correct - I don't think there's any disagreement. Hacking CSE might provide a bandage solution, but, as Escha noted, it would defeat the point of pushing FMF into the DAG in the first place.<br><br></div>I can go through and fix everything in-tree, but I'm wondering:<br></div>1. Does every change need a regression test?<br></div>2. How do we alert out-of-tree backends about the change? Is a note to llvm-dev good enough, or should there be an assert in the code if someone creates one of the FMF-capable nodes without explicitly specifying the flags?<br><div><div><div><div><div><div><div><div>

<div class="gmail_extra"><br><div class="gmail_quote">On Wed, Aug 5, 2015 at 10:58 AM, Hal Finkel <span dir="ltr"><<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">----- Original Message -----<br>

> From: "Sanjay Patel" <<a href="mailto:spatel@rotateright.com">spatel@rotateright.com</a>><br>

> To: "escha" <<a href="mailto:escha@apple.com">escha@apple.com</a>>, <a href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a><br>

> Sent: Wednesday, August 5, 2015 10:49:43 AM<br>

> Subject: Re: [PATCH] propagate IR-level fast-math-flags to DAG nodes, disabled by default<br>

><br>

><br>

> [Using new mailing list. Several of the previous mails in this thread<br>

> are probably lost forever for the lists; see below for context.]<br>

><br>

> I don't think we should leave CSE in the DAG broken while figuring<br>

> out how to solve this,<br>

<br>

</span>Is there really any disagreement here? It seems like someone needs to go through and make sure that all relevant combines propagate the flags. All other options proposed were simply suggestions for short-term workarounds, no?<br>

<br>

 -Hal<br>

<div><div class="h5"><br>

> so I reverted r243687 with:<br>

> <a href="http://llvm.org/viewvc/llvm-project?view=revision&revision=244053" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project?view=revision&revision=244053</a><br>

><br>

><br>

> Also filed a bug using the example code below:<br>

> <a href="https://llvm.org/bugs/show_bug.cgi?id=24363" rel="noreferrer" target="_blank">https://llvm.org/bugs/show_bug.cgi?id=24363</a><br>

><br>

><br>

> So we may get at least one in-tree regression test before the flag<br>

> gets enabled again. :)<br>

><br>

><br>

><br>

> On Tue, Aug 4, 2015 at 5:39 PM, Sanjay Patel < <a href="mailto:spatel@rotateright.com">spatel@rotateright.com</a><br>

> > wrote:<br>

><br>

><br>

><br>

> Thanks - your example made it much clearer to me how this goes wrong.<br>

> I think I should revert r243687 and try to fix everything under the<br>

> cover of that scaffolding...<br>

><br>

> I don't know how we'd test it all though. Here's an example that<br>

> shows the bug for one spot in DAGCombiner's GetNegatedExpression()<br>

> with an x86 target:<br>

><br>

> $ cat badflags.ll<br>

> define float @fmf(float %a, float %b) {<br>

> %mul1 = fmul fast float %a, %b<br>

> %nega = fsub fast float 0.0, %a<br>

> %mul2 = fmul fast float %nega, %b<br>

> %abx2 = fsub fast float %mul1, %mul2<br>

> ret float %abx2<br>

> }<br>

><br>

><br>

> $ ./llc -o - badflags.ll -mattr=fma -enable-unsafe-fp-math<br>

> -enable-fmf-dag=0<br>

> ...<br>

> vmulss %xmm1, %xmm0, %xmm0<br>

> vaddss %xmm0, %xmm0, %xmm0<br>

> retq<br>

><br>

> $ ./llc -o - badflags.ll -mattr=fma -enable-unsafe-fp-math<br>

> -enable-fmf-dag=1<br>

> ...<br>

> vmulss %xmm1, %xmm0, %xmm2<br>

> vfmadd213ss %xmm2, %xmm1, %xmm0 <--- failed to recognize that (a * b)<br>

> was already calculated<br>

> retq<br>

><br>

><br>

><br>

><br>

><br>

><br>

> On Tue, Aug 4, 2015 at 12:03 PM, escha < <a href="mailto:escha@apple.com">escha@apple.com</a> > wrote:<br>

><br>

><br>

><br>

> Unfortunately, no :/ though I can imagine one could be constructed in<br>

> *some* fashion, since surely crippled CSE is visible somehow<br>

> somewhere. The particular case that came up looked like this:<br>

><br>

><br>

> c = a * b FAST<br>

> d = -c FAST<br>

><br>

><br>

> (combine1 occurs)<br>

><br>

><br>

> c = a * b FAST<br>

> d = a * -b<br>

><br>

><br>

> then we have a combine in combine2, in our target, to deduplicate<br>

> these, which should result in this:<br>

><br>

><br>

><br>

> c = a * b FAST<br>

> d = -c FAST<br>

><br>

><br>

> which we do because negates are free (modifiers on a GPU) while an<br>

> extra multiply is not.<br>

><br>

><br>

> it visits d = a * -b and then searches the DAG to see if (a * b)<br>

> exists. But it doesn’t, only (a * b FAST) exists, the search fails,<br>

> and the code is pessimized.<br>

><br>

><br>

> —escha<br>

><br>

><br>

><br>

><br>

><br>

><br>

><br>

> On Aug 4, 2015, at 10:57 AM, Sanjay Patel < <a href="mailto:spatel@rotateright.com">spatel@rotateright.com</a> ><br>

> wrote:<br>

><br>

><br>

> Thanks! Do you have a test case that would show a difference from<br>

> that kind of change on an in-tree backend?<br>

><br>

><br>

><br>

><br>

> On Tue, Aug 4, 2015 at 11:34 AM, escha < <a href="mailto:escha@apple.com">escha@apple.com</a> > wrote:<br>

><br>

><br>

><br>

> I’ve only done the following patch, just as an experiment; I haven’t<br>

> done much work on it overall (I was just experimenting with trying<br>

> to kill off this individual regression).<br>

><br>

><br>

><br>

><br>

><br>

><br>

><br>

><br>

><br>

> —escha<br>

><br>

><br>

><br>

><br>

><br>

> On Aug 4, 2015, at 10:07 AM, Sanjay Patel < <a href="mailto:spatel@rotateright.com">spatel@rotateright.com</a> ><br>

> wrote:<br>

><br>

><br>

><br>

> Ok, that's good news. The other good news is that nobody has reported<br>

> crashes from this change...yet.<br>

><br>

> I'm fine proceeding either way (revert r243687 or just start patching<br>

> as-is). Let me know what you'd prefer.<br>

><br>

> Can you commit any patches you've already done for the generic DAG<br>

> classes?<br>

><br>

><br>

><br>

><br>

><br>

><br>

> On Tue, Aug 4, 2015 at 10:47 AM, escha < <a href="mailto:escha@apple.com">escha@apple.com</a> > wrote:<br>

><br>

><br>

><br>

> I have a patch to add propagation to our own backend, by the way; it<br>

> wasn’t at all difficult to write (it took like 20 minutes). But if<br>

> we keep the patch in, we should probably put out an announcement to<br>

> backend maintainers that they need to add fast-math flag<br>

> propagation.<br>

><br>

><br>

> —escha<br>

><br>

><br>

><br>

><br>

><br>

><br>

><br>

> On Aug 4, 2015, at 8:28 AM, escha < <a href="mailto:escha@apple.com">escha@apple.com</a> > wrote:<br>

><br>

><br>

><br>

> The “unsafe FP math” transforms aren’t enough; the problem is that<br>

> even the *safe-math* transforms need to propagate flags from their<br>

> source nodes, or you end up with un-CSEable duplicates (and of<br>

> course, lose flags for future transformations!). You need to pass<br>

> flags in for *every single creation of an FADD, FSUB, FADD, or<br>

> FDIV*, as far as I can tell. This involves a huge amount of diffs<br>

> that look like this:<br>

><br>

><br>

><br>

> // fold (fneg (fsub A, B)) -> (fsub B, A)<br>

> return DAG.getNode(ISD::FSUB, SDLoc(Op), Op.getValueType(),<br>

> - Op.getOperand(1), Op.getOperand(0));<br>

> + Op.getOperand(1), Op.getOperand(0),<br>

> + &cast<BinaryWithFlagsSDNode>(Op)->Flags);<br>

><br>

><br>

> One idea I had was to allow conservative CSE. That is, allow CSEing<br>

> nodes with fast-math flags, even if their flags differ — just, when<br>

> CSEing them, pick the intersection of the two flags. This will<br>

> guarantee that CSE will still function, even if DAG folds don’t<br>

> properly maintain fast-math flags. But we still need to eventually<br>

> add all these flag propagations; otherwise having fast-math flags in<br>

> the DAG is useless at best, isn’t it?<br>

><br>

><br>

> —escha<br>

><br>

><br>

><br>

><br>

><br>

><br>

><br>

><br>

><br>

> On Aug 4, 2015, at 7:26 AM, Sanjay Patel < <a href="mailto:spatel@rotateright.com">spatel@rotateright.com</a> ><br>

> wrote:<br>

><br>

><br>

><br>

><br>

> Does anyone see a workaround to do this piecemeal with FMF enabled?<br>

><br>

> Otherwise, I see about 30 checks for UnsafeFPMath transforms in<br>

> DAGCombiner...so I could revert r243687, start working on the<br>

> combiner fixes, and then turn the flag back on when DAGCombiner is<br>

> FMF-clean.<br>

><br>

> But this wouldn't be perfect if you have more UnsafeFPMath transforms<br>

> in your backend. You'd have to patch those too before we turn the<br>

> flag back on.<br>

><br>

> Any other ideas about how to proceed?<br>

><br>

><br>

><br>

> On Mon, Aug 3, 2015 at 4:52 PM, escha < <a href="mailto:escha@apple.com">escha@apple.com</a> > wrote:<br>

><br>

><br>

><br>

> We’re getting significant CSE regressions on our backend, caused by<br>

> the fact that none of the DAG folds propagate fast-math flags. This<br>

> results in nodes not being CSE’d, since we only CSE nodes if they<br>

> have identical flags (and thus worse codegen).<br>

><br>

><br>

> I was able to get rid of many of them by adding flag propagation to<br>

> FNEG folding, but there’s way more than just that.<br>

><br>

><br>

><br>

><br>

> —escha<br>

><br>

><br>

><br>

><br>

><br>

><br>

><br>

> On Jul 29, 2015, at 11:51 AM, Sanjay Patel < <a href="mailto:spatel@rotateright.com">spatel@rotateright.com</a> ><br>

> wrote:<br>

><br>

><br>

><br>

><br>

> For reference, fixes for PR24141 are in:<br>

> <a href="http://reviews.llvm.org/rL243293" rel="noreferrer" target="_blank">http://reviews.llvm.org/rL243293</a> <a href="http://reviews.llvm.org/rL243498" rel="noreferrer" target="_blank">http://reviews.llvm.org/rL243498</a><br>

> <a href="http://reviews.llvm.org/rL243500" rel="noreferrer" target="_blank">http://reviews.llvm.org/rL243500</a> ...assuming those don't cause new<br>

> breakage, it's finally, really time to flip the FMF flag...and wait<br>

> for more breakage.<br>

><br>

><br>

><br>

> On Mon, Jul 20, 2015 at 11:34 AM, Sanjay Patel <<br>

> <a href="mailto:spatel@rotateright.com">spatel@rotateright.com</a> > wrote:<br>

><br>

><br>

><br>

><br>

><br>

><br>

><br>

> On Fri, Jul 17, 2015 at 4:42 PM, Nick Lewycky < <a href="mailto:nlewycky@google.com">nlewycky@google.com</a> ><br>

> wrote:<br>

><br>

><br>

><br>

><br>

><br>

> On 9 July 2015 at 15:27, Sanjay Patel < <a href="mailto:spatel@rotateright.com">spatel@rotateright.com</a> ><br>

> wrote:<br>

><br>

><br>

><br>

><br>

> Hi Nick,<br>

><br>

> The reciprocal fix is in at r241826. Please let me know when you find<br>

> another failure. Thanks!<br>

><br>

><br>

><br>

> Thank you! I tested it and found exactly nothing wrong. Hopefully I<br>

> didn't goof the testing, but I think it's time to flip the flag to<br>

> true, then wait a few days before removing the flag entirely.<br>

><br>

><br>

><br>

> Wow...I'm surprised, but I'll hope for the best. Thanks for the<br>

> testing!<br>

><br>

><br>

><br>

><br>

><br>

><br>

><br>

><br>

><br>

><br>

> Note however, that we still have problems with PR24141 which is<br>

> related to unsafe fp math, but doesn't change with the<br>

> -enable-fmf-dag flag.<br>

><br>

> Proposed fix:<br>

> <a href="http://reviews.llvm.org/D11345" rel="noreferrer" target="_blank">http://reviews.llvm.org/D11345</a><br>

><br>

><br>

> _______________________________________________<br>

> llvm-commits mailing list<br>

> <a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a><br>

> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>

><br>

><br>

><br>

> _______________________________________________<br>

> llvm-commits mailing list<br>

> <a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a><br>

> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>

><br>

><br>

><br>

><br>

><br>

><br>

><br>

><br>

><br>

</div></div>> _______________________________________________<br>

> llvm-commits mailing list<br>

> <a href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a><br>

> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a><br>

><br>

<span class="HOEnZb"><font color="#888888"><br>

--<br>

Hal Finkel<br>

Assistant Computational Scientist<br>

Leadership Computing Facility<br>

Argonne National Laboratory<br>

</font></span></blockquote></div><br></div></div></div></div></div></div></div></div></div></div>