[PATCH] D16435: [RS4GC] Effective rematerialization at non-entry polls
Manuel Jacob via llvm-commits
llvm-commits at lists.llvm.org
Fri Jan 22 17:06:04 PST 2016
Hi
On 2016-01-22 02:10, Philip Reames wrote:
> reames created this revision.
> reames added reviewers: JosephTremoulet, sanjoy, igor-laevsky, mjacob.
> reames added a subscriber: llvm-commits.
> Herald added subscribers: mcrosier, sanjoy, MatzeB.
>
> This is an attempt at addressing 26223. Specifically, try to avoid
> unfortunate register spilling by trying to place rematerializations
> introduced by rewrite-statepoints-for-gc in order to maximize folding
> and simplification opportunities rather than to minimize execution
> frequency.
>
> If we have a bit of code like this:
> %addr = gep %o, 8
> loop {
> if (poll) {
> safepoint();
> }
> load %addr
> }
>
> We currently end up rewriting this as:
> %addr = gep %o, 8
> loop {
> %addr1 = phi (%addr, %addr2)
> if (poll) {
> safepoint();
> %remat = gep %o.relocated, 8
> }
> %addr2 = phi (%addr1, %remat)
> load %addr2
> }
> This ends up forcing us to rematerialize the address explicitly and
> likely will cause us to spill/fill the address if register
> constrained. This creates a bunch of dependent loads (fill from
> stack, load from result) which show up as hot in a couple of
> benchmarks.
>
> A much better result would be:
> %addr = gep %o, 8
> loop {
> if (poll) {
> safepoint();
> }
> %remat = gep %o.relocated, 8
> load %remat
> }
>
> This version allows the GEP to be folded directly into x86's native
> addressing modes.
>
> (Note: For conciseness, I'm not writing the phis for relocating %o,
> assume they're all there.)
>
> The particular heuristic chosen here is to push each given remat as
> late as possible. This has the effect of moving remats closer to uses
> and preventing the creation of unnecessary and confusing PHI nodes.
> Empirically, this does appear to help in some of the benchmarks when I
> encountered this, but I'm getting increasing uncomfortable with the
> coupling between RS4GC and CGP. In particular, a better version of
> this heuristic is already present in CGP.
>
> I think we should probably take this incremental step, but before
> going much further, factoring the code to share parts of the
> implementation of CGP might be a good idea. The generally problem is
> that many CGP transforms are hard to perform after RS4GC has run. It
> may make sense to selectively run them before hand.
I understand that RS4GC can make CGP's work harder. However I don't see
how this is relevant here. Where is the difference between doing the
addressing-mode-aware placement in RS4GC vs. doing it in CGP afterwards?
-Manuel
> http://reviews.llvm.org/D16435
>
> Files:
> lib/Transforms/Scalar/RewriteStatepointsForGC.cpp
> test/Transforms/RewriteStatepointsForGC/remat-schedule.ll
>
> test/Transforms/RewriteStatepointsForGC/rematerialize-derived-pointers.ll
More information about the llvm-commits
mailing list