[PATCH] Allow FMAs in safe math mode in some cases when one operand of the fmul is either exactly 0.0 or exactly 1.0.

Tue Jul 9 15:08:36 PDT 2013

On Tue, Jul 9, 2013 at 3:03 PM, Stephen Lin <swlin at post.harvard.edu> wrote:
>>> I ended up making it work on vectors on constrained cases (since the
>>> vector case was tested for when the function was recursive), but it
>>> tools a lot of thinking to convince myself I hadn't made a mistake.
>>>
>>> Do you think the vector case will come up enough to make this
>>> worthwhile? The problem is that the vector has to be transparently
>>> defined in the same basic block as the fmul and fadd for it to work,
>>> and most I think most vectors will be created using some kind of
>>> control flow that will necessarily span more than one basic block.
>>
>> I'm thinking of something like this: autovectorize this:
>> for (...) {
>>   a[i] = 1.0 + b[i]*c[i];
>> }
>> and you'll get a uniform vector of 1.0. I think that this is not uncommon.
>>

By the way, did you really mean "a[i] = 1.0 + b[i]*c[i];"? The cases
this patch handles are ones where the 0.0 or 1.0 is on one of the fmul
operands.

The following case might matter though:

    // f is a scalar boolean...

    for (...) {
      a[i] = b[i] + f*c[i];
    }

I think in this case a boolean vector of all 1's or 0's might be
formed, but it would most likely be formed outside of the loop so
SelectionDAG would not be able to see through it.

>
> I would like to be able to handle this case and others like it but I'm
> not sure if it will work even if I make DAGCombiner aware of vectors,
> because the DAGCombiner cannot examine basic blocks other than the one
> currently being processed. It depends on how the IR passes decide to
> arrange the code. Basically, SelectionDAG has to have enough
> information to prove that a vector is all 0's or 1's just by examining
> SDNodes within a single basic block, which I'm not sure will work in
> this case.
>
> In any case, I think that should be a separate patch. I'll add more
> checks for the scalar case and update this one first.
>
> Thanks,
> Stephen