[llvm] r182499 - This is an update to a previous commit (r181216).

Thu Jul 18 09:40:34 PDT 2013

Stephen,

As you pointed out, the reason I withdrew the canonicalization was that it introduced some performance regressions in some obscure cases.  We are now prepared to handle them as they crop up, so I think it fair to re-enable the canonicalization. It is the right way to represent what is going on.  We'll deal with FMA regressions as they surface.  This is good work, thank you.

JL

On Jul 16, 2013, at 13:48 , Stephen Lin <swlin at post.harvard.edu> wrote:

> Hi,
> 
> I have been working with Jean-Luc and following up about this issue.
> It turns out the original regression reported involved a particular
> case where turning on this instcombine would lead to a large number of
> selects on the same condition variable surviving until instruction
> selection that would all have otherwise turned into FMAs; in this
> case, the conversion from i1 to float is done once and reused for many
> FMAs.
> 
> I tested adding some code to CodeGenPrepare that can handle this
> situation (it's somewhat non-trivial, because you don't want to
> generate extra i1 to float conversions if there are multiple selects
> on the same i1, even if the selects are in different basic blocks); it
> works, but it requires each target implement some cost model hooks to
> determine whether the the uitofp + FMA or select + add form is better,
> and it's non-obvious how those cost model hooks should be implemented.
> 
> I starting trying to come up with benchmark numbers to fix the cost
> model for X86 to start with, but then I realized something that is
> somewhat obvious, in retrospect, actually: it's very difficult to
> construct a reduced case example where many selects on the same
> condition variable survive until instruction selection (i.e. are not
> folded together or otherwise simplified for other reasons, either by
> mid-level optimizers or DAG combine passes): the reason this was
> occurring in the original case where the regression was reported was
> very idiosyncratic to the case, for reasons that are too complicated
> to get into (the example was a complex piece of code with
> non-idiomatic constructs).
> 
> Anyway, so from my analysis, it seems that the select form is almost
> always better (because it allows other mid-level optimizations or DAG
> combines that combine the selects), and regressions of the type
> observed are idiosyncratic and rare enough (or possibly completely
> absent) in idiomatic code that they are not worth implementing a
> complex transformation and cost model for. (Furthermore, we have since
> addressed the originally reported regression via other means.)
> 
> So, if there are no objections, I'm going to reapply Jean-Luc's
> original canonicalization; if other regressions are reported and look
> important enough to address, then I can use them as test cases to
> implement the cost model portion of my code and submit it for review.
> 
> Thanks,
> Stephen
> 
> On Tue, May 28, 2013 at 8:28 AM, Jean-Luc Duprat <jduprat at apple.com> wrote:
>> It is now tracked in http://llvm.org/bugs/show_bug.cgi?id=16164
>> 
>> JL
>> 
>> On May 28, 2013, at 08:13 , Rafael Espíndola <rafael.espindola at gmail.com> wrote:
>> 
>>> On 28 May 2013 10:42, Jean-Luc Duprat <jduprat at apple.com> wrote:
>>>> Rafael,
>>>> 
>>>> I just wanted to circle back with you and summarize the conclusions of more internal discussions we came to.
>>>> The select form of the operation is most likely the right cannonicalization for this expression, and it is in fact possible that the vectorizer could do something with that representation.  At the current time however, this would have a negative impacton performance because it impairs FMA formation.  The proper fix is to fix FMA formation in all relevant backends, then enable the select inst combine, and finally to get the vectorizer to make use of the canonical select in helping make decisions.  This is quite a bit of work for unquantifiable performance gains…
>>>> 
>>>> In the short term, the fix I submitted maintains performance to the same level as it was prior to my initial commit.
>>>> As the above changes roll in, performance may improve.  We are tracking the request for changes, and will get to them eventually; but this is fairly low priority for us at the current time.
>>> 
>>> Thanks for summary, it sounds reasonable. Could you just open a bug on
>>> llvm.org/bugs to track it?
>>> 
>>>> JL
>>> 
>>> Thanks,
>>> Rafael
>> 
>> 
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits