[PATCH] Make GVN more iterative

Tue Aug 12 02:25:58 PDT 2014

Hi Daniel,

The 5% speedup in compile time is almost certainly entirely noise. That figure was got from running the LNT suite on a core i7.

You're right that in this testcase only one load is missed currently, but that is 50% of the loads in the testcase! The problem is chains of partially redundant loads. The reduced testcase that inspired this (taken from 450.soplex, where we lose 15% of our runtime due to it!) is:

double f(int stat, int i, double * restrict * restrict p)
{
    double x;
    switch (stat)
    {
        case 0:
        case 1:
            x = p[0][i] - 1;
            if (x < 0)
                return x;
        case 2:
            return 3 - p[0][i];
        default:
            return 0;
    }
}

You sound like an expert on the GVN code, which I certainly am not. I've worked with PRE heavily before, but that was in a different compiler that did not use SSA so the algorithm was totally different (and GVN didn't exist). Having looked at the LLVM algorithm, the first (GVN) stage performs PRE of loads, but the second stage performs PRE of non-loads.

This is obviously going to result in missing PRE opportunities. The example above ends up with a chain where you need to spot the first load is partially redundant (which GVN duly does), then spot that the "sext i32 -> i64" afterwards is partially redundant (which the second stage PRE duly does), then notice the next load is now redundant (woops, we never do load PRE again at this point!)

I don't see it as "we're missing just one load", I see it as "LLVM's implementation of a truly classical compiler optimization is really weak". What do you think? Should we implement a new algorithm, or make it more iterative? 

It only re-iterates if PRE actually does something, so I don't think this should slow the compiler down massively. I ran GVN over a clang.bc, and got this:

Before my patch (release+asserts):
  10.1046 ( 47.6%)   0.6320 ( 51.3%)  10.7367 ( 47.8%)  10.7387 ( 47.7%)  Global Value Numbering

After my patch (release+asserts):
  10.2886 ( 48.2%)   0.7441 ( 55.9%)  11.0327 ( 48.6%)  11.0000 ( 48.4%)  Global Value Numbering

Which is 0.3s (2.8%) worse, which doesn't sound like it should be a show-stopper to me! Perhaps you could run it on your own testcases?

Cheers,

James

-----Original Message-----
From: Daniel Berlin [mailto:dberlin at dberlin.org] 
Sent: 11 August 2014 23:49
To: James Molloy
Cc: Owen Anderson (resistor at mac.com); llvm-commits
Subject: Re: [PATCH] Make GVN more iterative

On Mon, Aug 11, 2014 at 3:25 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
> We have cases where GVN is *really slow* (IE 29 seconds and 80% of 
> compile time). Iterating it again is likely to make that worse.  Happy 
> to give them to you (or test it with this patch)
>
> You say you saw a 5% speedup in compile time, which seems really odd.
>
> What exactly did you run it on?
>
> Additionally, it kind of sounds like you are saying all this does is 
> remove 1 additional load for this testcase. Do you have more general 
> performance numbers?
> Iterating all of GVN to eliminate a single load seems like a pretty 
> heavy hammer.

To be clear, LLVM used to have a GVNPRE implementation, but it was decided this wasn't worth the cost of what it got.
What you are doing is effectively re-adding that, but without an integrated algorithm that was O(better time bounds).
Thus, this patch, at a glance, seems like the wrong approach.
If we really have a bunch of cases with significant performance benefits from GVN + PRE, then that would point towards moving back
towards GVN-PRE, which is why the comment says "   // Actually, when
this happens, we should just fully integrate PRE into GVN."

>
>
>
> On Mon, Aug 11, 2014 at 11:29 AM, James Molloy <James.Molloy at arm.com> wrote:
>> Hi all,
>>
>>
>>
>> GVN currently iterates until it can do no more, then performs PRE. 
>> There’s a FIXME, saying we should try GVN again if PRE made changes.
>>
>>
>>
>> This patch changes GVN to do this, motivated by the attached testcase 
>> (reduced from SPEC) where GVN currently leaves a redundant load.
>>
>>
>>
>> The FIXME mentions memory dependence checking, but it looks to me 
>> like the memory dependence updating got implemented after the FIXME 
>> was written, so it’s out of date.
>>
>>
>>
>> I’ve tested this for compile time and there are no non-noise 
>> regressions (in fact, the geomean was a 5% speedup, interestingly).
>>
>>
>>
>> What do you think?
>>
>>
>>
>> Cheers,
>>
>>
>>
>> James
>>
>>
>> -- IMPORTANT NOTICE: The contents of this email and any attachments 
>> are confidential and may also be privileged. If you are not the 
>> intended recipient, please notify the sender immediately and do not 
>> disclose the contents to any other person, use it for any purpose, or 
>> store or copy the information in any medium. Thank you.
>>
>> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, 
>> Registered in England & Wales, Company No: 2557590 ARM Holdings plc, 
>> Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in 
>> England & Wales, Company No: 2548782
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>