[PATCH] D15124: Use @llvm.invariant.start/end intrinsics to extend the meaning of basic AA's pointsToConstantMemory(), for GVN-based load elimination purposes [Local objects only]

Wed Jan 27 04:06:13 PST 2016

----- Original Message -----

> From: "Larisse Voufo" <lvoufo at gmail.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "Chandler Carruth" <chandlerc at gmail.com>,
> reviews+D15124+public+ddba332c127856cb at reviews.llvm.org,
> "llvm-commits" <llvm-commits at lists.llvm.org>
> Sent: Friday, January 15, 2016 11:55:17 AM
> Subject: Re: [PATCH] D15124: Use @llvm.invariant.start/end intrinsics
> to extend the meaning of basic AA's pointsToConstantMemory(), for
> GVN-based load elimination purposes [Local objects only]

> Disclaimer: It's a bit hard for me to follow the different "legs" in
> this conversation. So I'm likely to miss some important background
> info. But I'll try my best, starting with this one question below.

> On Wed, Jan 13, 2016 at 3:17 AM, Hal Finkel < hfinkel at anl.gov >
> wrote:

> > > From: "Chandler Carruth" < chandlerc at gmail.com >
> > 
> 
> > > To: "Hal Finkel" < hfinkel at anl.gov >, "Chandler Carruth" <
> > > chandlerc at gmail.com >
> > 
> 
> > > Cc: reviews+D15124+public+ddba332c127856cb at reviews.llvm.org ,
> > > "Larisse Voufo" < lvoufo at gmail.com >, "llvm-commits" <
> > > llvm-commits at lists.llvm.org >
> > 
> 
> > > Sent: Wednesday, January 13, 2016 4:27:13 AM
> > 
> 
> > > Subject: Re: [PATCH] D15124: Use @llvm.invariant.start/end
> > > intrinsics
> > > to extend the meaning of basic AA's pointsToConstantMemory(), for
> > > GVN-based load elimination purposes [Local objects only]
> > 
> 

> > > On Wed, Jan 13, 2016 at 2:00 AM Hal Finkel via llvm-commits <
> > > llvm-commits at lists.llvm.org > wrote:
> > 
> 

> > > > Actually, can you please remind me why we want post-dominance
> > > > information here at all?
> > > 
> > 
> 
> > > FWIW, I also would like to see a clear explanation of this -- I
> > > haven't yet.
> > 
> 
> It all boils down to solving the following problem:

> Given all the tracked invariant_start and invariant_end calls in a
> given InvariantInfo instance, implement the following function:

> /// \brief Performs invariant range analysis for some given
> instruction,
> /// based on some allocated memory address:
> // If the instruction accesses the given address, then this checks
> whether
> /// there is an invariant range (over the address) that the
> instruction
> /// belongs in.
> /// NOTE: Range analysis must be enabled for this computation to go
> through.
> bool queriedFromInvariantRange(const Instruction *I, const Value
> *Addr);

> The test cases in the patch show different combinations of load
> instructions and (nested) invariant ranges where some loads are
> merged together while other can't.

> When all instructions share the same block, that's easy. One can
> simply check that:
> * the invariant_start dominates the instruction and that
> * the instruction dominates the invariant_end.
Within the same basic-block, yes. However, within the same basic block, you should probably use OrderedBasicBlock (llvm/Analysis/OrderedBasicBlock.h) to avoid repeatedly walking the block to check for dominance/post-dominance. 

As soon as you have more than one block, however, post-dominance is not what you want. It fails to provide the desired answer even in simple early-exit cases. 

invariant_start 
| 
load 
| 
br 
/ \ 
ret ... 
| 
invariant_end 
... 

The invariant_end does not post-dominate the load, but the load should be considered to be of constant data. 

Thanks again, 
Hal 

> The real challenge is when one considers all instructions being
> nested in complex ways, some of which have been highlighted here,
> and to try and generalize the solution to all possible case scenari.
> Unless we limit the cases that we care about, I find it hard to
> conceive a way of getting around computing (once) and reusing both
> dominance and post-dominance info.
> In practice, I do not see it getting all that expensive if we keep
> the effects localized like they are on this patch.

> > > > For the invariant_start -> load, it seems like dominance
> > > > information
> > > > is what we want (all paths to the load need to pass through the
> > > > invariant_start (or some invariant_start, should there be
> > > > multiple
> > > > ones for the same pointer). Regarding invariant_end, we need to
> > > > make
> > > > sure that no paths from the invariant_start to the load pass
> > > > through
> > > > an invariant_end. For this, it seems conservatively correct to
> > > > check
> > > > that the load dominates the invariant_end (or all
> > > > invariant_ends,
> > > > should there be multiple), and that any containing loop has
> > > > backedges that branch to blocks dominating the
> > > > invariant_start(s).
> > > > We could use post-dom information for this check, but that's
> > > > just
> > > > a
> > > > different approximation.
> > > 
> > 
> 

> > > Sure.
> > 
> 

> > > I think that your statement is really hitting on the critical
> > > point
> > > here though: the issue is whether the invariant ends between the
> > > start and the load, nothing else. As you demonstrate, postdom
> > > actually gets the wrong answer:
> > 
> 

> > > > invariant_start
> > > 
> > 
> 
> > > > / \
> > > 
> > 
> 
> > > > | load #1
> > > 
> > 
> 
> > > > | |
> > > 
> > 
> 
> > > > \ /
> > > 
> > 
> 
> > > > \ /
> > > 
> > 
> 
> > > > |
> > > 
> > 
> 
> > > > load #2
> > > 
> > 
> 
> > > > / \
> > > 
> > 
> 
> > > > / \
> > > 
> > 
> 
> > > > ret \
> > > 
> > 
> 
> > > > invariant_end
> > > 
> > 
> 
> > > > \
> > > 
> > 
> 
> > > > \
> > > 
> > 
> 
> > > > ret
> > > 
> > 
> 

> > > > So in this case (assuming my attempt at ASCII-art is
> > > > successful),
> > > > both loads here can be considered to be invariant (both are
> > > > dominated by the invariant_start, and there are no paths from
> > > > the
> > > > invariant_start to either load passing through an
> > > > invariant_end).
> > > > However, while it is true that load #2 is post-dominated by the
> > > > invariant_end w.r.t. one of the exits, it is not w.r.t. all
> > > > exits.
> > > > load #1 here does not even post-dominate the invariant_start,
> > > > and
> > > > that's fine. However, load #1 does not dominate the
> > > > invariant_end
> > > > either.
> > > 
> > 
> 

> > > This is a great example. Larisse, this should really be a good
> > > test
> > > case for this patch.
> > 
> 

> > > > Maybe we need a dedicated CFG walk to collect the necessary
> > > > information?
> > > 
> > 
> 

> > > Yea. We check for the same kinds of things in a number of other
> > > places without post dominators. It might be useful to mirror the
> > > logic on them.
> > 
> 

> > > However, maybe there are easier ways. Thinking about this, for
> > > GVN
> > > in
> > > particular, I think I see really easy ways to make this fast and
> > > reasonably precise... Check my reasoning:
> > 
> 

> > > 1) The start must dominate the load.
> > 
> 
> > > 2) Therefore, if an end exists on a path from the start to the
> > > load,
> > > the start must dominate the end.
> > 
> 
> > > 3) GVN never introduces ends or changes the dominance
> > > relationship
> > > of
> > > starts and ends.
> > 
> 
> > > 4) We scan the function finding all of these intrinsics ahead of
> > > time
> > > already.
> > 
> 
> > > 5) We can build a map from domtree node (or basic block
> > > essentially)
> > > to the list of invariant end intrinsics in that block.
> > 
> 
> > > 6) We can walk the dom tree from the start to the load, and check
> > > each end in each block
> > 
> 

> > > I feel like I'm missing a case... do you see the hole in this
> > > plan?
> > 
> 
> > This works if there's only one start. If there are multiple starts,
> > then we need only the set to dominate, not any individual one.
> 

> > invariant_start invariant_start
> 
> > | |
> 
> > |________________ |
> 
> > |
> 
> > load
> 
> > |
> 
> > ret
> 

> > At least in theory, this is fine too. The "right" way to solve this
> > might be to set it up as a lattice/dataflow problem on the CFG, and
> > then iterate until convergence (which should happen in two
> > iterations, aside perhaps from pathological cases). The tricky part
> > here, I suspect, is actually the representation (you can't actually
> > build a large map covering (all pointers x all loads), and then
> > prune it). Aside from that, you mark all live-in pointers to all
> > blocks as invariant, except for the entry block, and at the
> > instructions that define the pointer value, and then iterate the
> > system, accounting for the invariant_start/end intrinsics, and you
> > should find the maximal fixed point. The result would be a map of
> > which pointers are invariant at which loads (which, as mentioned,
> > you'd need to represent intelligently). Doing this might actually
> > be
> > faster than locally walking the CFG many times anyway.
> 

> > Here's another test case:
> 

> > invariant_start
> 
> > ____|
> 
> > | |
> 
> > | load
> 
> > | |
> 
> > | invariant_end
> 
> > |____|
> 
> > |
> 
> > ret
> 

> > Here, we can't treat the load as invariant, because there's a path
> > from the invariant_start passing through the invariant_end via the
> > loop backedge.
> 

> > > It'll be a bit expensive because of the linear scan if there are
> > > a
> > > very large number of invariants. Maybe that's OK? Hard for me to
> > > tell...
> > 
> 
> > GVN already scans the entire (reachable) function body (and, by an
> > order of magnitude, is dominated by the cost of AA queries). I
> > doubt
> > we'll notice. That having been said, we can avoid the linear scan
> > if
> > we use a registration approach for these intrinsics, just as we did
> > with @llvm.assume (the AssumptionCache).
> 

> > Another issue worth considering:
> 

> > entry
> 
> > ___________|_____
> 
> > | |
> 
> > invariant_start invariant_start
> 
> > | |
> 
> > load load
> 
> > | |
> 
> > ... ...
> 

> > Generally, we'd PRE this load (adding it to the entry block).
> > Either
> > the invariant_start will block this (which is bad), or we'll want
> > to
> > also PRE the invariant_starts to avoid PRE killing the invariant
> > info.
> 

> > -Hal
> 

> > > > -Hal
> > > 
> > 
> 

> > > > --
> > > 
> > 
> 
> > > > Hal Finkel
> > > 
> > 
> 
> > > > Assistant Computational Scientist
> > > 
> > 
> 
> > > > Leadership Computing Facility
> > > 
> > 
> 
> > > > Argonne National Laboratory
> > > 
> > 
> 
> > > > _______________________________________________
> > > 
> > 
> 
> > > > llvm-commits mailing list
> > > 
> > 
> 
> > > > llvm-commits at lists.llvm.org
> > > 
> > 
> 
> > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
> > > 
> > 
> 

> > --
> 

> > Hal Finkel
> 
> > Assistant Computational Scientist
> 
> > Leadership Computing Facility
> 
> > Argonne National Laboratory
> 

-- 

Hal Finkel 
Assistant Computational Scientist 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160127/3d9388db/attachment.html>