[PATCH] This patch introduces MemorySSA, a virtual SSA form for memory.Details on what it looks like are in MemorySSA.h

Wed Feb 25 13:26:11 PST 2015

It's a one line code change. I'll play with it more later.  So far i've
found  that on small files it makes no real difference, but on larger code,
it is consistently 10-20% faster to do it eagerly. For GVN, anyway.

as for absolute time, GVN + MemorySSA also between 0-50% faster on average
small files i try it on.

(I tried things from our testsuite)

For example, one set of C++ files gives me:

before:
  0.0018 ( 92.1%)   0.0001 ( 83.5%)   0.0019 ( 91.7%)   0.0019 ( 91.5%)
 Global Value Numbering

 0.0021 ( 92.1%)   0.0006 ( 96.6%)   0.0027 ( 93.1%)   0.0059 ( 96.7%)
 Global Value Numbering

after:
   0.0010 ( 69.1%)   0.0001 ( 69.3%)   0.0011 ( 69.1%)   0.0011 ( 69.3%)
 Global Value Numbering
   0.0002 ( 16.3%)   0.0000 ( 13.7%)   0.0003 ( 16.1%)   0.0003 ( 16.1%)
 Memory SSA

   0.0009 ( 70.1%)   0.0001 ( 69.2%)   0.0009 ( 70.0%)   0.0009 ( 69.9%)
 Global Value Numbering
   0.0002 ( 16.4%)   0.0000 ( 13.3%)   0.0002 ( 16.2%)   0.0002 ( 16.2%)
 Memory SSA

On some files it's also a wash.
I haven't found anything where it is a regression yet (which is nice :P)

On Wed, Feb 25, 2015 at 11:16 AM, Philip Reames <listmail at philipreames.com>
wrote:

>  Given this, I'd say let's not worry about the difference between eager
> and lazy at this time.  Either is a huge win over where we are and we can
> come back to that decision later.  Go with whatever is easier to implement.
>
> (It would be good to confirm that the absolute time on a small file is
> also reasonable, but I suspect it will be.)
>
> Philip
>
>
> On 02/25/2015 10:32 AM, Daniel Berlin wrote:
>
> On this file, we get
>
>    11.0074 (83.0%)   0.1181 ( 49.5%)  11.1255 ( 82.6%)  11.1804 (82.7%)
>  Total   Global Value Numbering
>
>
>  I have files where GVN is 26-30 seconds, but memory ssa + gvn is still
> "6-7 seconds"
>
> On Wed, Feb 25, 2015 at 10:03 AM, Philip Reames <listmail at philipreames.com
> > wrote:
>
>>  Just for the perspective, how does this compare with our current GVN
>> times without Memory SSA?
>>
>>
>> On 02/25/2015 09:49 AM, Daniel Berlin wrote:
>>
>>  So, to circle back on timings:
>> On my very very large file with lots of functions i use to test GVN
>> timings,
>> doing the clobber checks at build time gives:
>>
>>     3.9350 ( 46.4%)   0.0564 ( 36.2%)   3.9915 ( 46.2%)   4.0004 (
>> 46.2%)  Global Value Numbering
>>    2.4518 ( 28.9%)   0.0276 ( 17.7%)   2.4795 ( 28.7%)   2.4841 ( 28.7%)
>>  Memory SSA
>>
>>     3.8392 ( 46.2%)   0.0620 ( 37.8%)   3.9012 ( 46.0%)   3.9410 (
>> 46.1%)  Global Value Numbering
>>    2.4047 ( 28.9%)   0.0319 ( 19.4%)   2.4366 ( 28.8%)   2.4532 ( 28.7%)
>>  Memory SSA
>>
>>     3.9762 ( 46.4%)   0.0699 ( 38.7%)   4.0461 ( 46.3%)   4.1086 (
>> 46.4%)  Global Value Numbering
>>    2.4720 ( 28.9%)   0.0354 ( 19.6%)   2.5074 ( 28.7%)   2.5295 ( 28.6%)
>>  Memory SSA
>>
>>
>>  (As a side note, old GVN took 12 seconds, so yay!)
>>
>>  Doing it lazily gives:
>>
>>    5.4972 ( 60.2%)   0.0795 ( 44.3%)   5.5767 ( 59.9%)   5.6230 ( 60.0%)
>>  Global Value Numbering
>>    1.5262 ( 16.7%)   0.0261 ( 14.5%)   1.5523 ( 16.7%)   1.5618 ( 16.7%)
>>  Memory SSA
>>
>>   5.4386 ( 60.1%)   0.0744 ( 43.1%)   5.5131 ( 59.8%)   5.5430 ( 59.8%)
>>  Global Value Numbering
>>    1.5087 ( 16.7%)   0.0251 ( 14.5%)   1.5338 ( 16.6%)   1.5413 ( 16.6%)
>>  Memory SSA
>>
>>    5.4627 ( 59.9%)   0.0865 ( 44.3%)   5.5492 ( 59.5%)   5.6065 ( 59.5%)
>>  Global Value Numbering
>>    1.5382 ( 16.9%)   0.0296 ( 15.2%)   1.5678 ( 16.8%)   1.5861 ( 16.8%)
>>  Memory SSA
>>
>>
>>  So, it definitely makes MemorySSA about 50-60% slower to build.
>> However, overall, for GVN, which looks at all loads, it is combined-time
>> 10-15% faster to do it at build time
>> (6.2-6.5 seconds vs 6.9-7.0 seconds)
>>
>>  So i think it should at least be an option when building memoryssa
>> (though i admit to not knowing if there is an easy way for passes to give
>> options to analysis passes.  If it keep it a utility, of course, it's easy)
>>
>>
>>  Thoughts welcome.
>>
>>
>>  To put these pass times in perspective, something simple like dominator
>> tree construction on this file takes:
>>    0.6060 (  6.6%)   0.0189 (  9.7%)   0.6249 (  6.7%)   0.6323 (  6.7%)
>>  Dominator Tree Construction
>>
>>  So uh, 1.5 seconds to do memoryssa is not that bad :)
>>
>>
>>
>> On Wed, Feb 25, 2015 at 8:44 AM, Daniel Berlin <dberlin at dberlin.org>
>> wrote:
>>
>>>
>>>
>>> On Wed, Feb 25, 2015 at 12:07 AM, Sanjoy Das <
>>> sanjoy at playingwithpointers.com> wrote:
>>>
>>>> > So, there is technically no guarantee that you will get an access that
>>>> > dominates you.
>>>>
>>>> I'm unable to come up with a situation where we'd start off with
>>>> memory-def dominating memory-uses and getMemoryClobberingAccess (as it
>>>> is implemented currently) would return a non-dominating memory access.
>>>> Do you have an example where this would happen?
>>>>
>>>
>>>  As currently implemented, you are correct, it will not.
>>> But I have not finished integration into GVN yet.
>>>
>>>  Currently, GVN *wants* to know the clobber in all cases so it can see
>>> if it can pull the store value out if possible.
>>>
>>>  So i am likely to  have to change it (or build a new API) to track and
>>> give the clobber if it's a branch above a phi node.
>>>
>>>  I can certainly build a new API for this, or i could just make doing
>>> what you suggest something it does internally while building.
>>>
>>>  But otherwise, my main use case is GVN, and i'm a bit wary of building
>>> an API for the rest (because i have no idea what others want :P)
>>>
>>>
>>>> > This is a harder question.  If you do it to every use, you may end up
>>>> > spending a bunch of time doing that.
>>>> > You are essentially trading build time for query  time.
>>>> > If the optimization pass only asks about certain loads, it may not be
>>>> a good
>>>> > tradeoff.
>>>>
>>>> Makes sense, thanks!
>>>>
>>>> A related question is if LLVM should cache the result of
>>>> `getClobberingMemoryAccess` in the MemoryAccess it computed the result
>>>> for (and the other MemoryAccess' it had to look at, transitively).
>>>> That seems like a good idea irrespective of how many memory ops were
>>>> queried.
>>>>
>>>
>>>   Yes, i think doing this makes sense, it'll save densemap lookups.
>>>
>>>
>>>> -- Sanjoy
>>>>
>>>
>>>
>>
>>
>>  _______________________________________________
>> llvm-commits mailing listllvm-commits at cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150225/4f896f41/attachment.html>