[PATCH] D15124: Use @llvm.invariant.start/end intrinsics to extend the meaning of basic AA's pointsToConstantMemory(), for GVN-based load elimination purposes [Local objects only]

Mon Jan 25 15:47:42 PST 2016

I have put together a short document to (hopefully) help keep this
conversation concise:

https://docs.google.com/document/d/1R-gINRdpxzLy82EZK_ymnVNyOPO-tzW5ksNcHuRvxBs/edit?usp=sharing

Let's see what you think of it. Comments (on the doc) welcome.

I'll reply to all the other comments (not addressed in the doc) below

Thanks,
-- Larisse.

On Fri, Jan 15, 2016 at 9:55 AM, Larisse Voufo <lvoufo at gmail.com> wrote:

> Disclaimer: It's a bit hard for me to follow the different "legs" in this
> conversation. So I'm likely to miss some important background info. But
> I'll try my best, starting with this one question below.
>
> On Wed, Jan 13, 2016 at 3:17 AM, Hal Finkel <hfinkel at anl.gov> wrote:
>
>>
>> ------------------------------
>>
>> *From: *"Chandler Carruth" <chandlerc at gmail.com>
>> *To: *"Hal Finkel" <hfinkel at anl.gov>, "Chandler Carruth" <
>> chandlerc at gmail.com>
>> *Cc: *reviews+D15124+public+ddba332c127856cb at reviews.llvm.org, "Larisse
>> Voufo" <lvoufo at gmail.com>, "llvm-commits" <llvm-commits at lists.llvm.org>
>> *Sent: *Wednesday, January 13, 2016 4:27:13 AM
>> *Subject: *Re: [PATCH] D15124: Use @llvm.invariant.start/end intrinsics
>> to extend the meaning of basic AA's pointsToConstantMemory(), for GVN-based
>> load elimination purposes [Local objects only]
>>
>> On Wed, Jan 13, 2016 at 2:00 AM Hal Finkel via llvm-commits <
>> llvm-commits at lists.llvm.org> wrote:
>>
>>> Actually, can you please remind me why we want post-dominance
>>> information here at all?
>>
>>
>> FWIW, I also would like to see a clear explanation of this -- I haven't
>> yet.
>>
>> It all boils down to solving the following problem:
>
> *Given all the tracked invariant_start and invariant_end calls in a given
> InvariantInfo instance, implement the following function:*
>
>   /// \brief Performs invariant range analysis for some given instruction,
>   /// based on some allocated memory address:
>   //  If the instruction accesses the given address, then this checks
> whether
>   /// there is an invariant range (over the address) that the instruction
>   /// belongs in.
>   /// NOTE: Range analysis must be enabled for this computation to go
> through.
> *  bool queriedFromInvariantRange(const Instruction *I, const Value
> *Addr);*
>
> The test cases in the patch show different combinations of load
> instructions and (nested) invariant ranges where some loads are merged
> together while other can't.
>
> When all instructions share the same block, that's easy. One can simply
> check that:
> * the invariant_start dominates the instruction and that
> * the instruction dominates the invariant_end.
>
> The real challenge is when one considers all instructions being nested in
> complex ways, some of which have been highlighted here, and to try and
> generalize the solution to all possible case scenari. Unless we limit the
> cases that we care about, I find it hard to conceive a way of getting
> around computing (once) and reusing both dominance and post-dominance info.
> In practice, I do not see it getting all that expensive if we keep the
> effects localized like they are on this patch.
>
>
>>
>>> For the invariant_start -> load, it seems like dominance information is
>>> what we want (all paths to the load need to pass through the
>>> invariant_start (or some invariant_start, should there be multiple ones for
>>> the same pointer).
>>
>> Correct.

> Regarding invariant_end, we need to make sure that no paths from the
>>> invariant_start to the load pass through an invariant_end.
>>
>> This is a bit tricky (see attached doc).

> For this, it seems conservatively correct to check that the load dominates
>>> the invariant_end (or all invariant_ends, should there be multiple),
>>
>> This would not handle nested invariant regions consistently (see doc).

> and that any containing loop has backedges that branch to blocks
>>> dominating the invariant_start(s). We could use post-dom information for
>>> this check, but that's just a different approximation.
>>>
>> Post-dom is the only way to handle general cases consistently, and it
does not cost anything more than using dominance analysis (if done right).

>
>> Sure.
>>
>> I think that your statement is really hitting on the critical point here
>> though: the issue is whether the invariant ends between the start and the
>> load, nothing else. As you demonstrate, postdom actually gets the wrong
>> answer:
>>
>>
>>>  invariant_start
>>>      /     \
>>>      |    load #1
>>>      |     |
>>>      \    /
>>>       \  /
>>>        |
>>>       load #2
>>>      /   \
>>>     /     \
>>>    ret     \
>>>            invariant_end
>>>              \
>>>               \
>>>               ret
>>>
>>> So in this case (assuming my attempt at ASCII-art is successful), both
>>> loads here can be considered to be invariant (both are dominated by the
>>> invariant_start, and there are no paths from the invariant_start to either
>>> load passing through an invariant_end). However, while it is true that load
>>> #2 is post-dominated by the invariant_end w.r.t. one of the exits, it is
>>> not w.r.t. all exits. load #1 here does not even post-dominate the
>>> invariant_start, and that's fine. However, load #1 does not dominate the
>>> invariant_end either.
>>>
>>
>> This is a great example. Larisse, this should really be a good test case
>> for this patch.
>>
>> If we are conservative about where we allow invariant ranges to be open
(cf. doc), Neither load #1, nor load #2 should be considered invariant
here. Load #1 is dominated by invariant_start, but not post-dominated by
the invariant_end. Load#2, in turn is dominated by invariant_start, but
also not post-dominated by invariant_end. There is no guarantee of
consistent optimizations here (as inner blocks are merged into outer blocks
etc...) So, we should not be optimizing these.

If however, there were no invariant_end at all, then we can safely assume
the range open and thus consider the loads invariant.

>
>>
>>>
>>> Maybe we need a dedicated CFG walk to collect the necessary information?
>>>
>> I'm curious. What would this do, and how cheaply, that we wouldn't do
with postdom or the dom alternatives?

>
>> Yea. We check for the same kinds of things in a number of other places
>> without post dominators. It might be useful to mirror the logic on them.
>>
>> However, maybe there are easier ways. Thinking about this, for GVN in
>> particular, I think I see really easy ways to make this fast and reasonably
>> precise... Check my reasoning:
>>
>> 1) The start must dominate the load.
>> 2) Therefore, if an end exists on a path from the start to the load, the
>> start must dominate the end.
>> 3) GVN never introduces ends or changes the dominance relationship of
>> starts and ends.
>> 4) We scan the function finding all of these intrinsics ahead of time
>> already.
>> 5) We can build a map from domtree node (or basic block essentially) to
>> the list of invariant end intrinsics in that block.
>> 6) We can walk the dom tree from the start to the load, and check each
>> end in each block
>>
>> I feel like I'm missing a case... do you see the hole in this plan?
>>
>> I think a lot of this is already handled in InvariantInfo.

> This works if there's only one start. If there are multiple starts, then
>> we need only the set to dominate, not any individual one.
>>
>> invariant_start         invariant_start
>>           |                             |
>>           |________________ |
>>                          |
>>                        load
>>                          |
>>                         ret
>>
>
These inner invariant_start should have no effect on the outer load.
Invariant_ends are defined in terms of invariant_starts, and
Invariant_ranges do not extend beyond the scope on invariant_starts. So, I
think this case is not relevant (at least for now).

>
>> At least in theory, this is fine too. The "right" way to solve this might
>> be to set it up as a lattice/dataflow problem on the CFG, and then iterate
>> until convergence (which should happen in two iterations, aside perhaps
>> from pathological cases). The tricky part here, I suspect, is actually the
>> representation (you can't actually build a large map covering (all pointers
>> x all loads), and then prune it). Aside from that, you mark all live-in
>> pointers to all blocks as invariant, except for the entry block, and at the
>> instructions that define the pointer value, and then iterate the system,
>> accounting for the invariant_start/end intrinsics, and you should find the
>> maximal fixed point. The result would be a map of which pointers are
>> invariant at which loads (which, as mentioned, you'd need to represent
>> intelligently). Doing this might actually be faster than locally walking
>> the CFG many times anyway.
>>
>> Here's another test case:
>>
>> invariant_start
>> ____|
>> |      |
>> |   load
>> |      |
>> |  invariant_end
>> |____|
>>        |
>>      ret
>>
>> Here, we can't treat the load as invariant, because there's a path from
>> the invariant_start passing through the invariant_end via the loop backedge.
>>
> This sounds like is a case for nested invariant regions in the doc (?)...

>
>>
>> It'll be a bit expensive because of the linear scan if there are a very
>> large number of invariants. Maybe that's OK? Hard for me to tell...
>>
>> I am not exactly how this makes a difference, performance-wise. But then
again, any sound implementation of queriedFromInvariantRange() would do...

> GVN already scans the entire (reachable) function body (and, by an order
>> of magnitude, is dominated by the cost of AA queries). I doubt we'll
>> notice. That having been said, we can avoid the linear scan if we use a
>> registration approach for these intrinsics, just as we did with
>> @llvm.assume (the AssumptionCache).
>>
>
The invariant-range-analysis pass and other (InvariantInfo-related passes)
are doing this.

>
>> Another issue worth considering:
>>
>>                        entry
>>       ___________|_____
>>       |                            |
>> invariant_start            invariant_start
>>       |                            |
>>     load                        load
>>       |                             |
>>      ...                           ...
>>
>> Generally, we'd PRE this load (adding it to the entry block). Either the
>> invariant_start will block this (which is bad), or we'll want to also PRE
>> the invariant_starts to avoid PRE killing the invariant info.
>>
>
I don't see how this is possible. Invariant intrinsics are ignored by basic
AA except for when invariant range analysis is computed.

>
>>
>>  -Hal
>>
>>
>>
>>>  -Hal
>>>
>>> --
>>> Hal Finkel
>>> Assistant Computational Scientist
>>> Leadership Computing Facility
>>> Argonne National Laboratory
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>
>>
>>
>>
>> --
>> Hal Finkel
>> Assistant Computational Scientist
>> Leadership Computing Facility
>> Argonne National Laboratory
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160125/9c934a88/attachment.html>