[llvm] r182499 - This is an update to a previous commit (r181216).

Tue Jul 16 13:48:04 PDT 2013

Hi,

I have been working with Jean-Luc and following up about this issue.
It turns out the original regression reported involved a particular
case where turning on this instcombine would lead to a large number of
selects on the same condition variable surviving until instruction
selection that would all have otherwise turned into FMAs; in this
case, the conversion from i1 to float is done once and reused for many
FMAs.

I tested adding some code to CodeGenPrepare that can handle this
situation (it's somewhat non-trivial, because you don't want to
generate extra i1 to float conversions if there are multiple selects
on the same i1, even if the selects are in different basic blocks); it
works, but it requires each target implement some cost model hooks to
determine whether the the uitofp + FMA or select + add form is better,
and it's non-obvious how those cost model hooks should be implemented.

I starting trying to come up with benchmark numbers to fix the cost
model for X86 to start with, but then I realized something that is
somewhat obvious, in retrospect, actually: it's very difficult to
construct a reduced case example where many selects on the same
condition variable survive until instruction selection (i.e. are not
folded together or otherwise simplified for other reasons, either by
mid-level optimizers or DAG combine passes): the reason this was
occurring in the original case where the regression was reported was
very idiosyncratic to the case, for reasons that are too complicated
to get into (the example was a complex piece of code with
non-idiomatic constructs).

Anyway, so from my analysis, it seems that the select form is almost
always better (because it allows other mid-level optimizations or DAG
combines that combine the selects), and regressions of the type
observed are idiosyncratic and rare enough (or possibly completely
absent) in idiomatic code that they are not worth implementing a
complex transformation and cost model for. (Furthermore, we have since
addressed the originally reported regression via other means.)

So, if there are no objections, I'm going to reapply Jean-Luc's
original canonicalization; if other regressions are reported and look
important enough to address, then I can use them as test cases to
implement the cost model portion of my code and submit it for review.

Thanks,
Stephen

On Tue, May 28, 2013 at 8:28 AM, Jean-Luc Duprat <jduprat at apple.com> wrote:
> It is now tracked in http://llvm.org/bugs/show_bug.cgi?id=16164
>
> JL
>
> On May 28, 2013, at 08:13 , Rafael Espíndola <rafael.espindola at gmail.com> wrote:
>
>> On 28 May 2013 10:42, Jean-Luc Duprat <jduprat at apple.com> wrote:
>>> Rafael,
>>>
>>> I just wanted to circle back with you and summarize the conclusions of more internal discussions we came to.
>>> The select form of the operation is most likely the right cannonicalization for this expression, and it is in fact possible that the vectorizer could do something with that representation.  At the current time however, this would have a negative impacton performance because it impairs FMA formation.  The proper fix is to fix FMA formation in all relevant backends, then enable the select inst combine, and finally to get the vectorizer to make use of the canonical select in helping make decisions.  This is quite a bit of work for unquantifiable performance gains…
>>>
>>> In the short term, the fix I submitted maintains performance to the same level as it was prior to my initial commit.
>>> As the above changes roll in, performance may improve.  We are tracking the request for changes, and will get to them eventually; but this is fairly low priority for us at the current time.
>>
>> Thanks for summary, it sounds reasonable. Could you just open a bug on
>> llvm.org/bugs to track it?
>>
>>> JL
>>
>> Thanks,
>> Rafael
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits