[PATCH] Review for hoisting and sinking of equivalent memory instruction (Instruction Merge Pass)

Thu Jun 19 14:27:34 PDT 2014

For the original implementation(within GVN) I didn’t see compile-time issues. It should not have changed now that the code is in a separate pass, but I’ll collect data for the latest version.

The biggest gain on the llvm test suite was a 2-3% gain in mcf.

-Gerolf

On Jun 19, 2014, at 7:13 AM, Daniel Berlin <dberlin at dberlin.org> wrote:

> Speaking of which, maybe I missed it, but do you have any numbers for
> compile time or performance impact on real program compilation?
> 
> 
> On Wed, Jun 18, 2014 at 8:55 PM, Gerolf Hoflehner <ghoflehner at apple.com> wrote:
>> Thanks Daniel & Tobias.
>> I think at this point limiting the number of checks and loads makes sense to
>> play it safe for compile-time.
>> 
>> 
>> -Gerolf
>> 
>> 
>> 
>> 
>> On Jun 18, 2014, at 1:47 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
>> 
>> On Wed, Jun 18, 2014 at 12:56 PM, Tobias Grosser <tobias at grosser.es> wrote:
>> 
>> On 18/06/2014 21:47, Daniel Berlin wrote:
>> 
>> 
>> FWIW: There is no easy way to do this O(n) for stores in LLVM, due to
>> the lack of something like memory-ssa (otherwise, you could sink to
>> the nearest common dominator of all immediate uses, as we do for GCC)
>> You can do it O(n),or much closer, in LLVM for loads, like this:
>> 
>> Assuming GVN and PRE has been run, all loads that can be determined to
>> be identical should look identical (if not, our GVN is seriously
>> busted :P) in their operands[1]
>> 
>> pending = hash table of <block, load operand> -> list of load instructions
>> 
>> for each load in the diamond:
>>  calculate sink location as nearest common dominator of:
>>     for each dependency according to memdep, the block of that
>> dependency
>>     for each RHS operand, the defining block of that operand.
>>  pending[<block, load operands>].insert(load instruction)
>> 
>> for each entry in pending:
>> if (list.size() > 1)
>>  Perform merge and hoist to end of common dominator block
>> 
>> 
>> This also would be even easier if GVN produced a value number or value
>> handle for each thing, like GCC's (then it wouldn't matter if they
>> looked identical, only if they calculate the same value), but c'est la
>> vie.
>> 
>> [1] The only case this wouldn't be true is if the load was defined by
>> operands in the diamond, in which case you couldn't hoist it out of
>> the diamond anyway without a real load PRE determining whether you
>> could move/recalculate the operands.
>> 
>> 
>> 
>> Thanks Daniel!
>> 
>> Gerolf, even if we don't get this algorithm to a linear run-time, does it
>> make sense to bound the number of checks such that we don't get a quadratic
>> increase in compile time for those corner cases, but that we just don't
>> optimize them?
>> 
>> 
>> +1
>> Little passes like this, created because other infrastructure
>> currently sucks and needs serious work, seem somewhat inevitable as
>> temporary things, but they do eat at compile time, so it's always good
>> to do what you can to limit impact, even if it means not catching
>> everything.
>> 
>> 
>> 
>> Tobias
>> 
>>