[PATCH] Make GVN more iterative

Daniel Berlin dberlin at dberlin.org
Tue Aug 12 09:26:10 PDT 2014


On Tue, Aug 12, 2014 at 2:25 AM, James Molloy <james.molloy at arm.com> wrote:
> Hi Daniel,
>
> The 5% speedup in compile time is almost certainly entirely noise. That figure was got from running the LNT suite on a core i7.
>
> You're right that in this testcase only one load is missed currently, but that is 50% of the loads in the testcase! The problem is chains of partially redundant loads. The reduced testcase that inspired this (taken from 450.soplex, where we lose 15% of our runtime due to it!) is:
>
> double f(int stat, int i, double * restrict * restrict p)
> {
>     double x;
>     switch (stat)
>     {
>         case 0:
>         case 1:
>             x = p[0][i] - 1;
>             if (x < 0)
>                 return x;
>         case 2:
>             return 3 - p[0][i];
>         default:
>             return 0;
>     }
> }
>
> You sound like an expert on the GVN code, which I certainly am not. I've worked with PRE heavily before, but that was in a different compiler that did not use SSA so the algorithm was totally different (and GVN didn't exist). Having looked at the LLVM algorithm, the first (GVN) stage performs PRE of loads, but the second stage performs PRE of non-loads.

Yes.  This is because GVN does not really value number memory, it uses
memdep to try to get it to tell whether the loads look the same (which
isn't really the same :P).  As such, PRE of memory is performed at the
point where it is asking memdep questions (and in fact, is mostly
independent of the rest of GVN, except for one small part).

As such, it will miss things, like you describe, where you end up with
partially redundant loads that interact with partially redundant
scalars (and other cases, since the Load PRE does not handle partial
availability)

>
> This is obviously going to result in missing PRE opportunities. The example above ends up with a chain where you need to spot the first load is partially redundant (which GVN duly does), then spot that the "sext i32 -> i64" afterwards is partially redundant (which the second stage PRE duly does), then notice the next load is now redundant (woops, we never do load PRE again at this point!)
>
> I don't see it as "we're missing just one load", I see it as "LLVM's implementation of a truly classical compiler optimization is really weak".

I 100% agree with you, but this is a known issue.  Nobody has had the
werewithal to actually solve it, and people keep piling on hacks and
bandaids, which is how GCC got so slow in the end.  There has to be a
stop loss point or GVN will just get ever slower.

> What do you think? Should we implement a new algorithm, or make it more iterative?
>

Realistically, we should change the algorithm. But this is a lot of
work. (if you are interested, i can help, and there is even a GVN+PRE
implementation in LLVM's sourcetree, if you look at the version
control history of Transforms/Scalar)

However,  you could go halfway for now.
There is nothing, IIRC, that should stop you from updating the
availability tables after the scalar PRE, and then just iterating that
+ load PRE (without the rest of GVN).  The load PRE does not really
depend on anything interesting, last i looked.

> It only re-iterates if PRE actually does something, so I don't think this should slow the compiler down massively.


3% on an average testcase is a lot to catch essentially 0.1% more
loads ;)  It's only because this load is important that anyone cares.


> I ran GVN over a clang.bc, and got this:
>
> Before my patch (release+asserts):
>   10.1046 ( 47.6%)   0.6320 ( 51.3%)  10.7367 ( 47.8%)  10.7387 ( 47.7%)  Global Value Numbering
>
> After my patch (release+asserts):
>   10.2886 ( 48.2%)   0.7441 ( 55.9%)  11.0327 ( 48.6%)  11.0000 ( 48.4%)  Global Value Numbering
>
> Which is 0.3s (2.8%) worse, which doesn't sound like it should be a show-stopper to me! Perhaps you could run it on your own testcases?
>
> Cheers,
>
> James
>
> -----Original Message-----
> From: Daniel Berlin [mailto:dberlin at dberlin.org]
> Sent: 11 August 2014 23:49
> To: James Molloy
> Cc: Owen Anderson (resistor at mac.com); llvm-commits
> Subject: Re: [PATCH] Make GVN more iterative
>
> On Mon, Aug 11, 2014 at 3:25 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
>> We have cases where GVN is *really slow* (IE 29 seconds and 80% of
>> compile time). Iterating it again is likely to make that worse.  Happy
>> to give them to you (or test it with this patch)
>>
>> You say you saw a 5% speedup in compile time, which seems really odd.
>>
>> What exactly did you run it on?
>>
>> Additionally, it kind of sounds like you are saying all this does is
>> remove 1 additional load for this testcase. Do you have more general
>> performance numbers?
>> Iterating all of GVN to eliminate a single load seems like a pretty
>> heavy hammer.
>
> To be clear, LLVM used to have a GVNPRE implementation, but it was decided this wasn't worth the cost of what it got.
> What you are doing is effectively re-adding that, but without an integrated algorithm that was O(better time bounds).
> Thus, this patch, at a glance, seems like the wrong approach.
> If we really have a bunch of cases with significant performance benefits from GVN + PRE, then that would point towards moving back
> towards GVN-PRE, which is why the comment says "   // Actually, when
> this happens, we should just fully integrate PRE into GVN."
>
>
>
>
>
>>
>>
>>
>> On Mon, Aug 11, 2014 at 11:29 AM, James Molloy <James.Molloy at arm.com> wrote:
>>> Hi all,
>>>
>>>
>>>
>>> GVN currently iterates until it can do no more, then performs PRE.
>>> There’s a FIXME, saying we should try GVN again if PRE made changes.
>>>
>>>
>>>
>>> This patch changes GVN to do this, motivated by the attached testcase
>>> (reduced from SPEC) where GVN currently leaves a redundant load.
>>>
>>>
>>>
>>> The FIXME mentions memory dependence checking, but it looks to me
>>> like the memory dependence updating got implemented after the FIXME
>>> was written, so it’s out of date.
>>>
>>>
>>>
>>> I’ve tested this for compile time and there are no non-noise
>>> regressions (in fact, the geomean was a 5% speedup, interestingly).
>>>
>>>
>>>
>>> What do you think?
>>>
>>>
>>>
>>> Cheers,
>>>
>>>
>>>
>>> James
>>>
>>>
>>> -- IMPORTANT NOTICE: The contents of this email and any attachments
>>> are confidential and may also be privileged. If you are not the
>>> intended recipient, please notify the sender immediately and do not
>>> disclose the contents to any other person, use it for any purpose, or
>>> store or copy the information in any medium. Thank you.
>>>
>>> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
>>> Registered in England & Wales, Company No: 2557590 ARM Holdings plc,
>>> Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in
>>> England & Wales, Company No: 2548782
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>
>
>
>




More information about the llvm-commits mailing list