[PATCH] D40480: MemorySSA backed Dead Store Elimination.

Thu Dec 7 06:43:55 PST 2017

dmgreen added a comment.

OK. I have some performance numbers. I'm compiling clang ("ninja clang") and using
-ftime-report/-stat to get info (with some extra precision for decimal places) and
summing the results for all the compiled files. The total runtime is a little noisy on
this machine, but these sub-numbers seem pretty stable between runs.

Firstly the good news. With this version we now remove more dead store. 
 Old: 41310   New: 51660
With my "MemSSA can enable us to remove more stores" hat on, this is good stuff.

Some more good news is that DSE is now quicker, for the sum of time for each file:
 Old: ~26s   New: ~19s

The bad news is that we also need to add in the MemorySSA passes. I think we now
calc this twice in the pipeline, not once as before, so times roughly double.
 Old: ~35s   New: ~69s
I'm hoping that in the long run we can shared the cost of this between other passes.
NewGVN is a couple of hops earlier in the LTO pass pipeline, LICM also quite close
in the normal one. Hopefully this cost can be shared out.

The other bad news is we use a post-dom tree (again, maybe sharable?):
 Old: ~15s   New: ~27s
But Memdeps is somehow now quicker:
 Old: ~13s   New: ~8.5s

The total runtime here was on the order of 10000s, so it's hard to pick out the overall
cost exactly. These results suggest that the total is now ~30s more, and excluding
MemSSA we are at roughly the same time.

I'm going to try and take a look at the most costly files and see if we can knock the most
expensive ones down without making the total slower. As Daniel mentioned, there some
good candidates for caching the results here, like those in isOverlap.

Maths isn't on my side for making the whole thing quicker. But it removes more dead stores :)

https://reviews.llvm.org/D40480