[PATCH] D15124: Use @llvm.invariant.start/end intrinsics to extend the meaning of basic AA's pointsToConstantMemory(), for GVN-based load elimination purposes [Local objects only]

Larisse Voufo via llvm-commits llvm-commits at lists.llvm.org
Wed Jan 27 06:39:10 PST 2016


On Wed, Jan 27, 2016 at 4:14 AM, Hal Finkel <hfinkel at anl.gov> wrote:

>
> ------------------------------
>
> *From: *"Larisse Voufo" <lvoufo at gmail.com>
> *To: *"Hal Finkel" <hfinkel at anl.gov>
> *Cc: *"Chandler Carruth" <chandlerc at gmail.com>,
> reviews+D15124+public+ddba332c127856cb at reviews.llvm.org, "llvm-commits" <
> llvm-commits at lists.llvm.org>
> *Sent: *Monday, January 25, 2016 5:47:42 PM
>
> *Subject: *Re: [PATCH] D15124: Use @llvm.invariant.start/end intrinsics
> to extend the meaning of basic AA's pointsToConstantMemory(), for GVN-based
> load elimination purposes [Local objects only]
>
> I have put together a short document to (hopefully) help keep this
> conversation concise:
>
>
> https://docs.google.com/document/d/1R-gINRdpxzLy82EZK_ymnVNyOPO-tzW5ksNcHuRvxBs/edit?usp=sharing
>
> Let's see what you think of it. Comments (on the doc) welcome.
>
> I'll reply to all the other comments (not addressed in the doc) below
>
> Thanks,
> -- Larisse.
>
>
> On Fri, Jan 15, 2016 at 9:55 AM, Larisse Voufo <lvoufo at gmail.com> wrote:
>
>> Disclaimer: It's a bit hard for me to follow the different "legs" in this
>> conversation. So I'm likely to miss some important background info. But
>> I'll try my best, starting with this one question below.
>>
>> On Wed, Jan 13, 2016 at 3:17 AM, Hal Finkel <hfinkel at anl.gov> wrote:
>>
>>>
>>> ------------------------------
>>>
>>> *From: *"Chandler Carruth" <chandlerc at gmail.com>
>>> *To: *"Hal Finkel" <hfinkel at anl.gov>, "Chandler Carruth" <
>>> chandlerc at gmail.com>
>>> *Cc: *reviews+D15124+public+ddba332c127856cb at reviews.llvm.org, "Larisse
>>> Voufo" <lvoufo at gmail.com>, "llvm-commits" <llvm-commits at lists.llvm.org>
>>> *Sent: *Wednesday, January 13, 2016 4:27:13 AM
>>> *Subject: *Re: [PATCH] D15124: Use @llvm.invariant.start/end intrinsics
>>> to extend the meaning of basic AA's pointsToConstantMemory(), for GVN-based
>>> load elimination purposes [Local objects only]
>>>
>>> On Wed, Jan 13, 2016 at 2:00 AM Hal Finkel via llvm-commits <
>>> llvm-commits at lists.llvm.org> wrote:
>>>
>>>> Actually, can you please remind me why we want post-dominance
>>>> information here at all?
>>>
>>>
>>> FWIW, I also would like to see a clear explanation of this -- I haven't
>>> yet.
>>>
>>> It all boils down to solving the following problem:
>>
>> *Given all the tracked invariant_start and invariant_end calls in a given
>> InvariantInfo instance, implement the following function:*
>>
>>   /// \brief Performs invariant range analysis for some given instruction,
>>   /// based on some allocated memory address:
>>   //  If the instruction accesses the given address, then this checks
>> whether
>>   /// there is an invariant range (over the address) that the instruction
>>   /// belongs in.
>>   /// NOTE: Range analysis must be enabled for this computation to go
>> through.
>> *  bool queriedFromInvariantRange(const Instruction *I, const Value
>> *Addr);*
>>
>> The test cases in the patch show different combinations of load
>> instructions and (nested) invariant ranges where some loads are merged
>> together while other can't.
>>
>> When all instructions share the same block, that's easy. One can simply
>> check that:
>> * the invariant_start dominates the instruction and that
>> * the instruction dominates the invariant_end.
>>
>> The real challenge is when one considers all instructions being nested in
>> complex ways, some of which have been highlighted here, and to try and
>> generalize the solution to all possible case scenari. Unless we limit the
>> cases that we care about, I find it hard to conceive a way of getting
>> around computing (once) and reusing both dominance and post-dominance info.
>> In practice, I do not see it getting all that expensive if we keep the
>> effects localized like they are on this patch.
>>
>>
>>>
>>>> For the invariant_start -> load, it seems like dominance information is
>>>> what we want (all paths to the load need to pass through the
>>>> invariant_start (or some invariant_start, should there be multiple ones for
>>>> the same pointer).
>>>
>>> Correct.
>
>> Regarding invariant_end, we need to make sure that no paths from the
>>>> invariant_start to the load pass through an invariant_end.
>>>
>>> This is a bit tricky (see attached doc).
>
>> For this, it seems conservatively correct to check that the load
>>>> dominates the invariant_end (or all invariant_ends, should there be
>>>> multiple),
>>>
>>> This would not handle nested invariant regions consistently (see doc).
>
>> and that any containing loop has backedges that branch to blocks
>>>> dominating the invariant_start(s). We could use post-dom information for
>>>> this check, but that's just a different approximation.
>>>>
>>> Post-dom is the only way to handle general cases consistently, and it
> does not cost anything more than using dominance analysis (if done right).
>
>
> I agree that post-dominance is not more expensive than dominance if done
> right. It would be really nice to make that true for us in practice.
> However, we need to be careful here because this is not really a
> post-dominance problem. As I mentioned in the other reply, just using
> post-dominance here will cause us to miss the early-exit case.
>

Please remind me which early exit case you are talking about? (Side note --
We need a better mechanism for managing these threads... :])

>
>
>
>>> Sure.
>>>
>>> I think that your statement is really hitting on the critical point here
>>> though: the issue is whether the invariant ends between the start and the
>>> load, nothing else. As you demonstrate, postdom actually gets the wrong
>>> answer:
>>>
>>>
>>>>  invariant_start
>>>>      /     \
>>>>      |    load #1
>>>>      |     |
>>>>      \    /
>>>>       \  /
>>>>        |
>>>>       load #2
>>>>      /   \
>>>>     /     \
>>>>    ret     \
>>>>            invariant_end
>>>>              \
>>>>               \
>>>>               ret
>>>>
>>>> So in this case (assuming my attempt at ASCII-art is successful), both
>>>> loads here can be considered to be invariant (both are dominated by the
>>>> invariant_start, and there are no paths from the invariant_start to either
>>>> load passing through an invariant_end). However, while it is true that load
>>>> #2 is post-dominated by the invariant_end w.r.t. one of the exits, it is
>>>> not w.r.t. all exits. load #1 here does not even post-dominate the
>>>> invariant_start, and that's fine. However, load #1 does not dominate the
>>>> invariant_end either.
>>>>
>>>
>>> This is a great example. Larisse, this should really be a good test case
>>> for this patch.
>>>
>>> If we are conservative about where we allow invariant ranges to be open
> (cf. doc), Neither load #1, nor load #2 should be considered invariant
> here. Load #1 is dominated by invariant_start, but not post-dominated by
> the invariant_end. Load#2, in turn is dominated by invariant_start, but
> also not post-dominated by invariant_end. There is no guarantee of
> consistent optimizations here (as inner blocks are merged into outer blocks
> etc...) So, we should not be optimizing these.
>
> If however, there were no invariant_end at all, then we can safely assume
> the range open and thus consider the loads invariant.
>
>>
>>>
>>>>
>>>> Maybe we need a dedicated CFG walk to collect the necessary information?
>>>>
>>> I'm curious. What would this do, and how cheaply, that we wouldn't do
> with postdom or the dom alternatives?
>
>
> It would check the necessary condition directly. It would walk all paths
> from the load to the exit, stopping when finding blocks with
> invariant_exit, previously-visited blocks, exits, or when some search limit
> is reached. This is obviously not the ideal solution, solving the data-flow
> problem would be better.
>
>
>
>>> Yea. We check for the same kinds of things in a number of other places
>>> without post dominators. It might be useful to mirror the logic on them.
>>>
>>> However, maybe there are easier ways. Thinking about this, for GVN in
>>> particular, I think I see really easy ways to make this fast and reasonably
>>> precise... Check my reasoning:
>>>
>>> 1) The start must dominate the load.
>>> 2) Therefore, if an end exists on a path from the start to the load, the
>>> start must dominate the end.
>>> 3) GVN never introduces ends or changes the dominance relationship of
>>> starts and ends.
>>> 4) We scan the function finding all of these intrinsics ahead of time
>>> already.
>>> 5) We can build a map from domtree node (or basic block essentially) to
>>> the list of invariant end intrinsics in that block.
>>> 6) We can walk the dom tree from the start to the load, and check each
>>> end in each block
>>>
>>> I feel like I'm missing a case... do you see the hole in this plan?
>>>
>>> I think a lot of this is already handled in InvariantInfo.
>
>
>> This works if there's only one start. If there are multiple starts, then
>>> we need only the set to dominate, not any individual one.
>>>
>>> invariant_start         invariant_start
>>>           |                             |
>>>           |________________ |
>>>                          |
>>>                        load
>>>                          |
>>>                         ret
>>>
>>
> These inner invariant_start should have no effect on the outer load.
> Invariant_ends are defined in terms of invariant_starts, and
> Invariant_ranges do not extend beyond the scope on invariant_starts. So, I
> think this case is not relevant (at least for now).
>
>
>>
>>> At least in theory, this is fine too. The "right" way to solve this
>>> might be to set it up as a lattice/dataflow problem on the CFG, and then
>>> iterate until convergence (which should happen in two iterations, aside
>>> perhaps from pathological cases). The tricky part here, I suspect, is
>>> actually the representation (you can't actually build a large map covering
>>> (all pointers x all loads), and then prune it). Aside from that, you mark
>>> all live-in pointers to all blocks as invariant, except for the entry
>>> block, and at the instructions that define the pointer value, and then
>>> iterate the system, accounting for the invariant_start/end intrinsics, and
>>> you should find the maximal fixed point. The result would be a map of which
>>> pointers are invariant at which loads (which, as mentioned, you'd need to
>>> represent intelligently). Doing this might actually be faster than locally
>>> walking the CFG many times anyway.
>>>
>>> Here's another test case:
>>>
>>> invariant_start
>>> ____|
>>> |      |
>>> |   load
>>> |      |
>>> |  invariant_end
>>> |____|
>>>        |
>>>      ret
>>>
>>> Here, we can't treat the load as invariant, because there's a path from
>>> the invariant_start passing through the invariant_end via the loop backedge.
>>>
>> This sounds like is a case for nested invariant regions in the doc (?)...
>
>>
>>>
>>> It'll be a bit expensive because of the linear scan if there are a very
>>> large number of invariants. Maybe that's OK? Hard for me to tell...
>>>
>>> I am not exactly how this makes a difference, performance-wise. But then
> again, any sound implementation of queriedFromInvariantRange() would do...
>
>
>> GVN already scans the entire (reachable) function body (and, by an order
>>> of magnitude, is dominated by the cost of AA queries). I doubt we'll
>>> notice. That having been said, we can avoid the linear scan if we use a
>>> registration approach for these intrinsics, just as we did with
>>> @llvm.assume (the AssumptionCache).
>>>
>>
> The invariant-range-analysis pass and other (InvariantInfo-related passes)
> are doing this.
>
>
>>
>>> Another issue worth considering:
>>>
>>>                        entry
>>>       ___________|_____
>>>       |                            |
>>> invariant_start            invariant_start
>>>       |                            |
>>>     load                        load
>>>       |                             |
>>>      ...                           ...
>>>
>>> Generally, we'd PRE this load (adding it to the entry block). Either the
>>> invariant_start will block this (which is bad), or we'll want to also PRE
>>> the invariant_starts to avoid PRE killing the invariant info.
>>>
>>
> I don't see how this is possible. Invariant intrinsics are ignored by
> basic AA except for when invariant range analysis is computed.
>
>
> You don't see how what is possible? The fact that the intrinsics are
> ignored by BasicAA is precisely why we'd want to PRE the intrinsics along
> with the loads.
>

I am scanning through the implementation of GVN::performPRE(), and it is
not clear to me how having an invariant_start before a load prevents the
load from being PRE'd. Besides, have you looked at the comparative stats
<https://docs.google.com/spreadsheets/d/19W1167l9QFrXMXccX4-cValmC5EtlFfUbT7Tr3xaoJs/edit?usp=sharing#gid=1063925865>
I linked earlier? We actually get a *10% increase on the number of loads
PRE'd*.




>
>
>  -Hal
>
>
>>>
>>>  -Hal
>>>
>>>
>>>
>>>>  -Hal
>>>>
>>>> --
>>>> Hal Finkel
>>>> Assistant Computational Scientist
>>>> Leadership Computing Facility
>>>> Argonne National Laboratory
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>
>>>
>>>
>>>
>>> --
>>> Hal Finkel
>>> Assistant Computational Scientist
>>> Leadership Computing Facility
>>> Argonne National Laboratory
>>>
>>
>>
>
>
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160127/fa5254ed/attachment.html>


More information about the llvm-commits mailing list