[PATCH] D40480: MemorySSA backed Dead Store Elimination.

Fri Dec 8 11:02:34 PST 2017

Hello

I too was a little surprised at some of these numbers. For DSE, it seems to usually be very quick, just with some bad worst-case times. Post-doms never have those same worst-cases.

I think that before it was using one post-dom tree in the pass pipeline (for adce I believe). Now there are 2, hence the ~doubling of the time (15s to 27). Looking at the details of the pass output, these are pretty close  together with only LoopSimplify/LICM in between. Providing we can preserve the PDT across LoopSimplify, this should allow us to get this for free (ehm, plus preserving time, that I believe should be cheap).

Loop simplify is preserving a DT already, so I'd guess it should be simple enough to get this working.
I'll take a look,
Dave

From: Daniel Berlin <dberlin at dberlin.org>
Sent: 08 December 2017 16:48:53
To: reviews+D40480+public+ec7e4dc10b50dfd8 at reviews.llvm.org
Cc: David Green; Roger Ferrer Ibanez; Javed Absar; Piotr Padlewski; George Burgess IV; Friedman, Eli; Davide Italiano; llvm-commits at xorshift.org; llvm-commits
Subject: Re: [PATCH] D40480: MemorySSA backed Dead Store Elimination.

On Thu, Dec 7, 2017 at 6:43 AM, Dave Green via Phabricator <reviews at reviews.llvm.org> wrote:
 dmgreen added a comment.

OK. I have some performance numbers. I'm compiling clang ("ninja clang") and using
-ftime-report/-stat to get info (with some extra precision for decimal places) and
summing the results for all the compiled files. The total runtime is a little noisy on
this machine, but these sub-numbers seem pretty stable between runs.

Firstly the good news. With this version we now remove more dead store.
 Old: 41310   New: 51660
With my "MemSSA can enable us to remove more stores" hat on, this is good stuff.

Some more good news is that DSE is now quicker, for the sum of time for each file:
 Old: ~26s   New: ~19s

The bad news is that we also need to add in the MemorySSA passes. I think we now
calc this twice in the pipeline, not once as before, so times roughly double.
 Old: ~35s   New: ~69s
I'm hoping that in the long run we can shared the cost of this between other passes.
NewGVN is a couple of hops earlier in the LTO pass pipeline, LICM also quite close
in the normal one. Hopefully this cost can be shared out.

Yeah, this is always the case when we introduce new stateful infrastructure.  We can just preserve it.

The other bad news is we use a post-dom tree (again, maybe sharable?):
 Old: ~15s   New: ~27s

This is surprising actually.
This would imply that post-dom is costing more than the pass?

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.