# [PATCH] D15124: Use @llvm.invariant.start/end intrinsics to extend the meaning of basic AA's pointsToConstantMemory(), for GVN-based load elimination purposes [Local objects only]

Hal Finkel via llvm-commits llvm-commits at lists.llvm.org
Wed Jan 27 06:59:23 PST 2016

```----- Original Message -----

> From: "Larisse Voufo" <lvoufo at gmail.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "Chandler Carruth" <chandlerc at gmail.com>,
> reviews+D15124+public+ddba332c127856cb at reviews.llvm.org,
> "llvm-commits" <llvm-commits at lists.llvm.org>
> Sent: Wednesday, January 27, 2016 8:39:10 AM
> Subject: Re: [PATCH] D15124: Use @llvm.invariant.start/end intrinsics
> to extend the meaning of basic AA's pointsToConstantMemory(), for
> GVN-based load elimination purposes [Local objects only]

> On Wed, Jan 27, 2016 at 4:14 AM, Hal Finkel < hfinkel at anl.gov >
> wrote:

> > > From: "Larisse Voufo" < lvoufo at gmail.com >
> >
>
> > > To: "Hal Finkel" < hfinkel at anl.gov >
> >
>
> > > Cc: "Chandler Carruth" < chandlerc at gmail.com >,
> > > reviews+D15124+public+ddba332c127856cb at reviews.llvm.org ,
> > > "llvm-commits" < llvm-commits at lists.llvm.org >
> >
>
> > > Sent: Monday, January 25, 2016 5:47:42 PM
> >
>

> > > Subject: Re: [PATCH] D15124: Use @llvm.invariant.start/end
> > > intrinsics
> > > to extend the meaning of basic AA's pointsToConstantMemory(), for
> > > GVN-based load elimination purposes [Local objects only]
> >
>

> > > I have put together a short document to (hopefully) help keep
> > > this
> > > conversation concise:
> >
>

> >
>

> > > Let's see what you think of it. Comments (on the doc) welcome.
> >
>

> > > below
> >
>

> > > Thanks,
> >
>
> > > -- Larisse.
> >
>

> > > On Fri, Jan 15, 2016 at 9:55 AM, Larisse Voufo < lvoufo at gmail.com
> > > >
> > > wrote:
> >
>

> > > > Disclaimer: It's a bit hard for me to follow the different
> > > > "legs"
> > > > in
> > > > this conversation. So I'm likely to miss some important
> > > > background
> > > > info. But I'll try my best, starting with this one question
> > > > below.
> > >
> >
>

> > > > On Wed, Jan 13, 2016 at 3:17 AM, Hal Finkel < hfinkel at anl.gov >
> > > > wrote:
> > >
> >
>

> > > > > > From: "Chandler Carruth" < chandlerc at gmail.com >
> > > > >
> > > >
> > >
> >
>
> > > > > > To: "Hal Finkel" < hfinkel at anl.gov >, "Chandler Carruth" <
> > > > > > chandlerc at gmail.com >
> > > > >
> > > >
> > >
> >
>
> > > > > > Cc: reviews+D15124+public+ddba332c127856cb at reviews.llvm.org
> > > > > > ,
> > > > > > "Larisse Voufo" < lvoufo at gmail.com >, "llvm-commits" <
> > > > > > llvm-commits at lists.llvm.org >
> > > > >
> > > >
> > >
> >
>
> > > > > > Sent: Wednesday, January 13, 2016 4:27:13 AM
> > > > >
> > > >
> > >
> >
>
> > > > > > Subject: Re: [PATCH] D15124: Use @llvm.invariant.start/end
> > > > > > intrinsics
> > > > > > to extend the meaning of basic AA's
> > > > > > pointsToConstantMemory(),
> > > > > > for
> > > > > > GVN-based load elimination purposes [Local objects only]
> > > > >
> > > >
> > >
> >
>

> > > > > > On Wed, Jan 13, 2016 at 2:00 AM Hal Finkel via llvm-commits
> > > > > > <
> > > > > > llvm-commits at lists.llvm.org > wrote:
> > > > >
> > > >
> > >
> >
>

> > > > > > > Actually, can you please remind me why we want
> > > > > > > post-dominance
> > > > > > > information here at all?
> > > > > >
> > > > >
> > > >
> > >
> >
>
> > > > > > FWIW, I also would like to see a clear explanation of this
> > > > > > --
> > > > > > I
> > > > > > haven't yet.
> > > > >
> > > >
> > >
> >
>
> > > > It all boils down to solving the following problem:
> > >
> >
>

> > > > Given all the tracked invariant_start and invariant_end calls
> > > > in
> > > > a
> > > > given InvariantInfo instance, implement the following function:
> > >
> >
>

> > > > /// \brief Performs invariant range analysis for some given
> > > > instruction,
> > >
> >
>
> > > > /// based on some allocated memory address:
> > >
> >
>
> > > > // If the instruction accesses the given address, then this
> > > > checks
> > > > whether
> > >
> >
>
> > > > /// there is an invariant range (over the address) that the
> > > > instruction
> > >
> >
>
> > > > /// belongs in.
> > >
> >
>
> > > > /// NOTE: Range analysis must be enabled for this computation
> > > > to
> > > > go
> > > > through.
> > >
> >
>
> > > > bool queriedFromInvariantRange(const Instruction *I, const
> > > > Value
> > >
> >
>

> > > > The test cases in the patch show different combinations of load
> > > > instructions and (nested) invariant ranges where some loads are
> > > > merged together while other can't.
> > >
> >
>

> > > > When all instructions share the same block, that's easy. One
> > > > can
> > > > simply check that:
> > >
> >
>
> > > > * the invariant_start dominates the instruction and that
> > >
> >
>
> > > > * the instruction dominates the invariant_end.
> > >
> >
>

> > > > The real challenge is when one considers all instructions being
> > > > nested in complex ways, some of which have been highlighted
> > > > here,
> > > > and to try and generalize the solution to all possible case
> > > > scenari.
> > > > Unless we limit the cases that we care about, I find it hard to
> > > > conceive a way of getting around computing (once) and reusing
> > > > both
> > > > dominance and post-dominance info.
> > >
> >
>
> > > > In practice, I do not see it getting all that expensive if we
> > > > keep
> > > > the effects localized like they are on this patch.
> > >
> >
>

> > > > > > > For the invariant_start -> load, it seems like dominance
> > > > > > > information
> > > > > > > is what we want (all paths to the load need to pass
> > > > > > > through
> > > > > > > the
> > > > > > > invariant_start (or some invariant_start, should there be
> > > > > > > multiple
> > > > > > > ones for the same pointer).
> > > > > >
> > > > >
> > > >
> > >
> >
>
> > > Correct.
> >
>
> > > > > > > Regarding invariant_end, we need to make sure that no
> > > > > > > paths
> > > > > > > from
> > > > > > > the
> > > > > > > invariant_start to the load pass through an
> > > > > > > invariant_end.
> > > > > >
> > > > >
> > > >
> > >
> >
>
> > > This is a bit tricky (see attached doc).
> >
>

> > > > > > > For this, it seems conservatively correct to check that
> > > > > > > the
> > > > > > > load
> > > > > > > dominates the invariant_end (or all invariant_ends,
> > > > > > > should
> > > > > > > there
> > > > > > > be
> > > > > > > multiple),
> > > > > >
> > > > >
> > > >
> > >
> >
>
> > > This would not handle nested invariant regions consistently (see
> > > doc).
> >
>
> > > > > > > and that any containing loop has backedges that branch to
> > > > > > > blocks
> > > > > > > dominating the invariant_start(s). We could use post-dom
> > > > > > > information
> > > > > > > for this check, but that's just a different
> > > > > > > approximation.
> > > > > >
> > > > >
> > > >
> > >
> >
>

> > > Post-dom is the only way to handle general cases consistently,
> > > and
> > > it
> > > does not cost anything more than using dominance analysis (if
> > > done
> > > right).
> >
>
> > I agree that post-dominance is not more expensive than dominance if
> > done right. It would be really nice to make that true for us in
> > practice. However, we need to be careful here because this is not
> > really a post-dominance problem. As I mentioned in the other reply,
> > just using post-dominance here will cause us to miss the early-exit
> > case.
>
> Please remind me which early exit case you are talking about? (Side
> note -- We need a better mechanism for managing these threads... :])
invariant_start
|
|
br
/ \
ret ...
|
invariant_end
...

The invariant_end does not post-dominate the load, but the load should be considered to be of constant data. This is the simple early-exit case ;)

> > > > > > Sure.
> > > > >
> > > >
> > >
> >
>

> > > > > > I think that your statement is really hitting on the
> > > > > > critical
> > > > > > point
> > > > > > here though: the issue is whether the invariant ends
> > > > > > between
> > > > > > the
> > > > > > start and the load, nothing else. As you demonstrate,
> > > > > > postdom
> > > > > > actually gets the wrong answer:
> > > > >
> > > >
> > >
> >
>

> > > > > > > invariant_start
> > > > > >
> > > > >
> > > >
> > >
> >
>
> > > > > > > / \
> > > > > >
> > > > >
> > > >
> > >
> >
>
> > > > > > > | load #1
> > > > > >
> > > > >
> > > >
> > >
> >
>
> > > > > > > | |
> > > > > >
> > > > >
> > > >
> > >
> >
>
> > > > > > > \ /
> > > > > >
> > > > >
> > > >
> > >
> >
>
> > > > > > > \ /
> > > > > >
> > > > >
> > > >
> > >
> >
>
> > > > > > > |
> > > > > >
> > > > >
> > > >
> > >
> >
>
> > > > > > > load #2
> > > > > >
> > > > >
> > > >
> > >
> >
>
> > > > > > > / \
> > > > > >
> > > > >
> > > >
> > >
> >
>
> > > > > > > / \
> > > > > >
> > > > >
> > > >
> > >
> >
>
> > > > > > > ret \
> > > > > >
> > > > >
> > > >
> > >
> >
>
> > > > > > > invariant_end
> > > > > >
> > > > >
> > > >
> > >
> >
>
> > > > > > > \
> > > > > >
> > > > >
> > > >
> > >
> >
>
> > > > > > > \
> > > > > >
> > > > >
> > > >
> > >
> >
>
> > > > > > > ret
> > > > > >
> > > > >
> > > >
> > >
> >
>

> > > > > > > So in this case (assuming my attempt at ASCII-art is
> > > > > > > successful),
> > > > > > > both loads here can be considered to be invariant (both
> > > > > > > are
> > > > > > > dominated by the invariant_start, and there are no paths
> > > > > > > from
> > > > > > > the
> > > > > > > invariant_start to either load passing through an
> > > > > > > invariant_end).
> > > > > > > However, while it is true that load #2 is post-dominated
> > > > > > > by
> > > > > > > the
> > > > > > > invariant_end w.r.t. one of the exits, it is not w.r.t.
> > > > > > > all
> > > > > > > exits.
> > > > > > > load #1 here does not even post-dominate the
> > > > > > > invariant_start,
> > > > > > > and
> > > > > > > that's fine. However, load #1 does not dominate the
> > > > > > > invariant_end
> > > > > > > either.
> > > > > >
> > > > >
> > > >
> > >
> >
>

> > > > > > This is a great example. Larisse, this should really be a
> > > > > > good
> > > > > > test
> > > > > > case for this patch.
> > > > >
> > > >
> > >
> >
>
> > > If we are conservative about where we allow invariant ranges to
> > > be
> > > open (cf. doc), Neither load #1, nor load #2 should be considered
> > > invariant here. Load #1 is dominated by invariant_start, but not
> > > post-dominated by the invariant_end. Load#2, in turn is dominated
> > > by
> > > invariant_start, but also not post-dominated by invariant_end.
> > > There
> > > is no guarantee of consistent optimizations here (as inner blocks
> > > are merged into outer blocks etc...) So, we should not be
> > > optimizing
> > > these.
> >
>

> > > If however, there were no invariant_end at all, then we can
> > > safely
> > > assume the range open and thus consider the loads invariant.
> >
>
> > > > > > > Maybe we need a dedicated CFG walk to collect the
> > > > > > > necessary
> > > > > > > information?
> > > > > >
> > > > >
> > > >
> > >
> >
>

> > > I'm curious. What would this do, and how cheaply, that we
> > > wouldn't
> > > do
> > > with postdom or the dom alternatives?
> >
>
> > It would check the necessary condition directly. It would walk all
> > paths from the load to the exit, stopping when finding blocks with
> > invariant_exit, previously-visited blocks, exits, or when some
> > search limit is reached. This is obviously not the ideal solution,
> > solving the data-flow problem would be better.
>

> > > > > > Yea. We check for the same kinds of things in a number of
> > > > > > other
> > > > > > places without post dominators. It might be useful to
> > > > > > mirror
> > > > > > the
> > > > > > logic on them.
> > > > >
> > > >
> > >
> >
>

> > > > > > for
> > > > > > GVN
> > > > > > in
> > > > > > particular, I think I see really easy ways to make this
> > > > > > fast
> > > > > > and
> > > > > > reasonably precise... Check my reasoning:
> > > > >
> > > >
> > >
> >
>

> > > > > > 1) The start must dominate the load.
> > > > >
> > > >
> > >
> >
>
> > > > > > 2) Therefore, if an end exists on a path from the start to
> > > > > > the
> > > > > > load,
> > > > > > the start must dominate the end.
> > > > >
> > > >
> > >
> >
>
> > > > > > 3) GVN never introduces ends or changes the dominance
> > > > > > relationship
> > > > > > of
> > > > > > starts and ends.
> > > > >
> > > >
> > >
> >
>
> > > > > > 4) We scan the function finding all of these intrinsics
> > > > > > ahead
> > > > > > of
> > > > > > time
> > > > > > already.
> > > > >
> > > >
> > >
> >
>
> > > > > > 5) We can build a map from domtree node (or basic block
> > > > > > essentially)
> > > > > > to the list of invariant end intrinsics in that block.
> > > > >
> > > >
> > >
> >
>
> > > > > > 6) We can walk the dom tree from the start to the load, and
> > > > > > check
> > > > > > each end in each block
> > > > >
> > > >
> > >
> >
>

> > > > > > I feel like I'm missing a case... do you see the hole in
> > > > > > this
> > > > > > plan?
> > > > >
> > > >
> > >
> >
>
> > > I think a lot of this is already handled in InvariantInfo.
> >
>

> > > > > This works if there's only one start. If there are multiple
> > > > > starts,
> > > > > then we need only the set to dominate, not any individual
> > > > > one.
> > > >
> > >
> >
>

> > > > > invariant_start invariant_start
> > > >
> > >
> >
>
> > > > > | |
> > > >
> > >
> >
>
> > > > > |________________ |
> > > >
> > >
> >
>
> > > > > |
> > > >
> > >
> >
>
> > > > > load
> > > >
> > >
> >
>
> > > > > |
> > > >
> > >
> >
>
> > > > > ret
> > > >
> > >
> >
>

> > > These inner invariant_start should have no effect on the outer
> >
>
> > > Invariant_ends are defined in terms of invariant_starts, and
> > > Invariant_ranges do not extend beyond the scope on
> > > invariant_starts.
> > > So, I think this case is not relevant (at least for now).
> >
>

> > > > > At least in theory, this is fine too. The "right" way to
> > > > > solve
> > > > > this
> > > > > might be to set it up as a lattice/dataflow problem on the
> > > > > CFG,
> > > > > and
> > > > > then iterate until convergence (which should happen in two
> > > > > iterations, aside perhaps from pathological cases). The
> > > > > tricky
> > > > > part
> > > > > here, I suspect, is actually the representation (you can't
> > > > > actually
> > > > > build a large map covering (all pointers x all loads), and
> > > > > then
> > > > > prune it). Aside from that, you mark all live-in pointers to
> > > > > all
> > > > > blocks as invariant, except for the entry block, and at the
> > > > > instructions that define the pointer value, and then iterate
> > > > > the
> > > > > system, accounting for the invariant_start/end intrinsics,
> > > > > and
> > > > > you
> > > > > should find the maximal fixed point. The result would be a
> > > > > map
> > > > > of
> > > > > which pointers are invariant at which loads (which, as
> > > > > mentioned,
> > > > > you'd need to represent intelligently). Doing this might
> > > > > actually
> > > > > be
> > > > > faster than locally walking the CFG many times anyway.
> > > >
> > >
> >
>

> > > > > Here's another test case:
> > > >
> > >
> >
>

> > > > > invariant_start
> > > >
> > >
> >
>
> > > > > ____|
> > > >
> > >
> >
>
> > > > > | |
> > > >
> > >
> >
>
> > > > > | load
> > > >
> > >
> >
>
> > > > > | |
> > > >
> > >
> >
>
> > > > > | invariant_end
> > > >
> > >
> >
>
> > > > > |____|
> > > >
> > >
> >
>
> > > > > |
> > > >
> > >
> >
>
> > > > > ret
> > > >
> > >
> >
>

> > > > > Here, we can't treat the load as invariant, because there's a
> > > > > path
> > > > > from the invariant_start passing through the invariant_end
> > > > > via
> > > > > the
> > > > > loop backedge.
> > > >
> > >
> >
>
> > > This sounds like is a case for nested invariant regions in the
> > > doc
> > > (?)...
> >
>
> > > > > > It'll be a bit expensive because of the linear scan if
> > > > > > there
> > > > > > are
> > > > > > a
> > > > > > very large number of invariants. Maybe that's OK? Hard for
> > > > > > me
> > > > > > to
> > > > > > tell...
> > > > >
> > > >
> > >
> >
>
> > > I am not exactly how this makes a difference, performance-wise.
> > > But
> > > then again, any sound implementation of
> > > queriedFromInvariantRange()
> > > would do...
> >
>

> > > > > GVN already scans the entire (reachable) function body (and,
> > > > > by
> > > > > an
> > > > > order of magnitude, is dominated by the cost of AA queries).
> > > > > I
> > > > > doubt
> > > > > we'll notice. That having been said, we can avoid the linear
> > > > > scan
> > > > > if
> > > > > we use a registration approach for these intrinsics, just as
> > > > > we
> > > > > did
> > > > > with @llvm.assume (the AssumptionCache).
> > > >
> > >
> >
>

> > > The invariant-range-analysis pass and other
> > > (InvariantInfo-related
> > > passes) are doing this.
> >
>

> > > > > Another issue worth considering:
> > > >
> > >
> >
>

> > > > > entry
> > > >
> > >
> >
>
> > > > > ___________|_____
> > > >
> > >
> >
>
> > > > > | |
> > > >
> > >
> >
>
> > > > > invariant_start invariant_start
> > > >
> > >
> >
>
> > > > > | |
> > > >
> > >
> >
>
> > > >
> > >
> >
>
> > > > > | |
> > > >
> > >
> >
>
> > > > > ... ...
> > > >
> > >
> >
>

> > > > > Generally, we'd PRE this load (adding it to the entry block).
> > > > > Either
> > > > > the invariant_start will block this (which is bad), or we'll
> > > > > want
> > > > > to
> > > > > also PRE the invariant_starts to avoid PRE killing the
> > > > > invariant
> > > > > info.
> > > >
> > >
> >
>
> > > I don't see how this is possible. Invariant intrinsics are
> > > ignored
> > > by
> > > basic AA except for when invariant range analysis is computed.
> >
>
> > You don't see how what is possible? The fact that the intrinsics
> > are
> > ignored by BasicAA is precisely why we'd want to PRE the intrinsics
> > along with the loads.
>
> I am scanning through the implementation of GVN::performPRE(), and it
> is not clear to me how having an invariant_start before a load
> prevents the load from being PRE'd. Besides, have you looked at the
> comparative stats I linked earlier? We actually get a 10% increase
> on the number of loads PRE'd .

Right, I'm not complaining about that ;) -- I'm saying that, when we do PRE the loads, if they're arranged as I indicated above, *after* the loads are PRE'd, we've lost the information on the const-ness of the loaded data. Have you checked the configuration I sketched above? I imagine that we'll get an increase in PRE'd loads from loads all dominated by the same invariant_start, in which case this is not a concern.

Thanks again,
Hal

> > -Hal
>

> > > > > -Hal
> > > >
> > >
> >
>

> > > > > > > -Hal
> > > > > >
> > > > >
> > > >
> > >
> >
>

> > > > > > > --
> > > > > >
> > > > >
> > > >
> > >
> >
>
> > > > > > > Hal Finkel
> > > > > >
> > > > >
> > > >
> > >
> >
>
> > > > > > > Assistant Computational Scientist
> > > > > >
> > > > >
> > > >
> > >
> >
>
> > > > > > > Leadership Computing Facility
> > > > > >
> > > > >
> > > >
> > >
> >
>
> > > > > > > Argonne National Laboratory
> > > > > >
> > > > >
> > > >
> > >
> >
>
> > > > > > > _______________________________________________
> > > > > >
> > > > >
> > > >
> > >
> >
>
> > > > > > > llvm-commits mailing list
> > > > > >
> > > > >
> > > >
> > >
> >
>
> > > > > > > llvm-commits at lists.llvm.org
> > > > > >
> > > > >
> > > >
> > >
> >
>
> > > > > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
> > > > > >
> > > > >
> > > >
> > >
> >
>

> > > > > --
> > > >
> > >
> >
>

> > > > > Hal Finkel
> > > >
> > >
> >
>
> > > > > Assistant Computational Scientist
> > > >
> > >
> >
>
> > > > > Leadership Computing Facility
> > > >
> > >
> >
>
> > > > > Argonne National Laboratory
> > > >
> > >
> >
>

> > --
>

> > Hal Finkel
>
> > Assistant Computational Scientist
>
>
> > Argonne National Laboratory
>

--

Hal Finkel
Assistant Computational Scientist