[PATCH] propagate IR-level fast-math-flags to DAG nodes, disabled by default

Sanjay Patel spatel at rotateright.com
Wed Aug 5 08:49:43 PDT 2015


[Using new mailing list. Several of the previous mails in this thread are
probably lost forever for the lists; see below for context.]

I don't think we should leave CSE in the DAG broken while figuring out how
to solve this, so I reverted r243687 with:
http://llvm.org/viewvc/llvm-project?view=revision&revision=244053

Also filed a bug using the example code below:
https://llvm.org/bugs/show_bug.cgi?id=24363

So we may get at least one in-tree regression test before the flag gets
enabled again. :)


On Tue, Aug 4, 2015 at 5:39 PM, Sanjay Patel <spatel at rotateright.com> wrote:

> Thanks - your example made it much clearer to me how this goes wrong. I
> think I should revert r243687 and try to fix everything under the cover of
> that scaffolding...
>
> I don't know how we'd test it all though. Here's an example that shows the
> bug for one spot in DAGCombiner's GetNegatedExpression() with an x86 target:
>
> $ cat badflags.ll
> define float @fmf(float %a, float %b) {
>   %mul1 = fmul fast float %a, %b
>   %nega = fsub fast float 0.0, %a
>   %mul2 = fmul fast float %nega, %b
>   %abx2 = fsub fast float %mul1, %mul2
>   ret float %abx2
> }
>
>
> $ ./llc -o - badflags.ll -mattr=fma -enable-unsafe-fp-math
> -enable-fmf-dag=0
> ...
>     vmulss    %xmm1, %xmm0, %xmm0
>     vaddss    %xmm0, %xmm0, %xmm0
>     retq
>
> $ ./llc -o - badflags.ll -mattr=fma -enable-unsafe-fp-math
> -enable-fmf-dag=1
> ...
>     vmulss    %xmm1, %xmm0, %xmm2
>     vfmadd213ss    %xmm2, %xmm1, %xmm0  <--- failed to recognize that (a *
> b) was already calculated
>     retq
>
>
> On Tue, Aug 4, 2015 at 12:03 PM, escha <escha at apple.com> wrote:
>
>> Unfortunately, no :/ though I can imagine one could be constructed in
>> *some* fashion, since surely crippled CSE is visible somehow somewhere. The
>> particular case that came up looked like this:
>>
>> c = a * b FAST
>> d = -c FAST
>>
>> (combine1 occurs)
>>
>> c = a * b FAST
>> d = a * -b
>>
>> then we have a combine in combine2, in our target, to deduplicate these,
>> which should result in this:
>>
>> c = a * b FAST
>> d = -c FAST
>>
>> which we do because negates are free (modifiers on a GPU) while an extra
>> multiply is not.
>>
>> it visits d = a * -b and then searches the DAG to see if (a * b) exists.
>> But it doesn’t, only (a * b FAST) exists, the search fails, and the code is
>> pessimized.
>>
>> —escha
>>
>> On Aug 4, 2015, at 10:57 AM, Sanjay Patel <spatel at rotateright.com> wrote:
>>
>> Thanks! Do you have a test case that would show a difference from that
>> kind of change on an in-tree backend?
>>
>>
>> On Tue, Aug 4, 2015 at 11:34 AM, escha <escha at apple.com> wrote:
>>
>>> I’ve only done the following patch, just as an experiment; I haven’t
>>> done much work on it overall (I was just experimenting with trying to kill
>>> off this individual regression).
>>>
>>>
>>>
>>>
>>> —escha
>>>
>>> On Aug 4, 2015, at 10:07 AM, Sanjay Patel <spatel at rotateright.com>
>>> wrote:
>>>
>>> Ok, that's good news. The other good news is that nobody has reported
>>> crashes from this change...yet.
>>>
>>> I'm fine proceeding either way (revert r243687 or just start patching
>>> as-is). Let me know what you'd prefer.
>>>
>>> Can you commit any patches you've already done for the generic DAG
>>> classes?
>>>
>>> On Tue, Aug 4, 2015 at 10:47 AM, escha <escha at apple.com> wrote:
>>>
>>>> I have a patch to add propagation to our own backend, by the way; it
>>>> wasn’t at all difficult to write (it took like 20 minutes). But if we keep
>>>> the patch in, we should probably put out an announcement to backend
>>>> maintainers that they need to add fast-math flag propagation.
>>>>
>>>> —escha
>>>>
>>>> On Aug 4, 2015, at 8:28 AM, escha <escha at apple.com> wrote:
>>>>
>>>> The “unsafe FP math” transforms aren’t enough; the problem is that even
>>>> the *safe-math* transforms need to propagate flags from their source nodes,
>>>> or you end up with un-CSEable duplicates (and of course, lose flags for
>>>> future transformations!). You need to pass flags in for *every single
>>>> creation of an FADD, FSUB, FADD, or FDIV*, as far as I can tell. This
>>>> involves a huge amount of diffs that look like this:
>>>>
>>>>      // fold (fneg (fsub A, B)) -> (fsub B, A)
>>>>      return DAG.getNode(ISD::FSUB, SDLoc(Op), Op.getValueType(),
>>>> -                       Op.getOperand(1), Op.getOperand(0));
>>>> +                       Op.getOperand(1), Op.getOperand(0),
>>>> +                       &cast<BinaryWithFlagsSDNode>(Op)->Flags);
>>>>
>>>> One idea I had was to allow conservative CSE. That is, allow CSEing
>>>> nodes with fast-math flags, even if their flags differ — just, when CSEing
>>>> them, pick the intersection of the two flags. This will guarantee that CSE
>>>> will still function, even if DAG folds don’t properly maintain fast-math
>>>> flags. But we still need to eventually add all these flag propagations;
>>>> otherwise having fast-math flags in the DAG is useless at best, isn’t it?
>>>>
>>>> —escha
>>>>
>>>>
>>>>
>>>> On Aug 4, 2015, at 7:26 AM, Sanjay Patel <spatel at rotateright.com>
>>>> wrote:
>>>>
>>>> Does anyone see a workaround to do this piecemeal with FMF enabled?
>>>>
>>>> Otherwise, I see about 30 checks for UnsafeFPMath transforms in
>>>> DAGCombiner...so I could revert r243687, start working on the combiner
>>>> fixes, and then turn the flag back on when DAGCombiner is FMF-clean.
>>>>
>>>> But this wouldn't be perfect if you have more UnsafeFPMath transforms
>>>> in your backend. You'd have to patch those too before we turn the flag back
>>>> on.
>>>>
>>>> Any other ideas about how to proceed?
>>>>
>>>> On Mon, Aug 3, 2015 at 4:52 PM, escha <escha at apple.com> wrote:
>>>>
>>>>> We’re getting significant CSE regressions on our backend, caused by
>>>>> the fact that none of the DAG folds propagate fast-math flags. This results
>>>>> in nodes not being CSE’d, since we only CSE nodes if they have identical
>>>>> flags (and thus worse codegen).
>>>>>
>>>>> I was able to get rid of many of them by adding flag propagation to
>>>>> FNEG folding, but there’s way more than just that.
>>>>>
>>>>> —escha
>>>>>
>>>>> On Jul 29, 2015, at 11:51 AM, Sanjay Patel <spatel at rotateright.com>
>>>>> wrote:
>>>>>
>>>>> For reference, fixes for PR24141 are in:
>>>>>
>>>>> http://reviews.llvm.org/rL243293 <https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_rL243293&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=mQ4LZ2PUj9hpadE3cDHZnIdEwhEBrbAstXeMaFoB9tg&m=EH36SVVrOANvPpWUfmtjHvcZene16JZYtYcOsOWHmmQ&s=oJkUxyP-U28dL9zPjcyCjESFgqf9fEa1WxScYHC1BLE&e=>http://reviews.llvm.org/rL243498 <https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_rL243498&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=mQ4LZ2PUj9hpadE3cDHZnIdEwhEBrbAstXeMaFoB9tg&m=EH36SVVrOANvPpWUfmtjHvcZene16JZYtYcOsOWHmmQ&s=9UShb3DxctMQors5uP4ErkfZ2ETsaqYXEeRLX8qKYt0&e=>http://reviews.llvm.org/rL243500 <https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_rL243500&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=mQ4LZ2PUj9hpadE3cDHZnIdEwhEBrbAstXeMaFoB9tg&m=EH36SVVrOANvPpWUfmtjHvcZene16JZYtYcOsOWHmmQ&s=S5dEJW-NYcq9yLglX74iN2esNypm3qfP6wCNidxysx8&e=>
>>>>>
>>>>> ...assuming those don't cause new breakage, it's finally, really time
>>>>> to flip the FMF flag...and wait for more breakage.
>>>>>
>>>>> On Mon, Jul 20, 2015 at 11:34 AM, Sanjay Patel <spatel at rotateright.com
>>>>> > wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 17, 2015 at 4:42 PM, Nick Lewycky <nlewycky at google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> On 9 July 2015 at 15:27, Sanjay Patel <spatel at rotateright.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Nick,
>>>>>>>>
>>>>>>>> The reciprocal fix is in at r241826. Please let me know when you
>>>>>>>> find another failure. Thanks!
>>>>>>>>
>>>>>>>
>>>>>>> Thank you! I tested it and found exactly nothing wrong. Hopefully I
>>>>>>> didn't goof the testing, but I think it's time to flip the flag to true,
>>>>>>> then wait a few days before removing the flag entirely.
>>>>>>>
>>>>>>
>>>>>> Wow...I'm surprised, but I'll hope for the best. Thanks for the
>>>>>> testing!
>>>>>>
>>>>>>
>>>>>> Note however, that we still have problems with PR24141 which is
>>>>>>> related to unsafe fp math, but doesn't change with the -enable-fmf-dag flag.
>>>>>>>
>>>>>>
>>>>>> Proposed fix:
>>>>>> http://reviews.llvm.org/D11345
>>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_D11345&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=mQ4LZ2PUj9hpadE3cDHZnIdEwhEBrbAstXeMaFoB9tg&m=EH36SVVrOANvPpWUfmtjHvcZene16JZYtYcOsOWHmmQ&s=q0g5ELtAiUzID-3xj21ZzkiASkbrFaYCWXDy-nBF0OI&e=>
>>>>>>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> llvm-commits mailing list
>>>>> llvm-commits at cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150805/309090bd/attachment.html>


More information about the llvm-commits mailing list