[LLVMdev] Separating loop nests based on profile information?

Mon Jan 12 13:11:53 PST 2015

On Mon, Jan 12, 2015 at 1:08 PM, Philip Reames <listmail at philipreames.com>
wrote:

>
> On 01/11/2015 10:35 PM, Daniel Berlin wrote:
>
>
>>  Looking at what LLVM does, the failing on the PRE side is that our
>> PRE/GVN models are not strong enough to handle this. I'm not sure at all
>> why we think anything else is necessary.
>>
>
>  By this i mean we don't actually perform real PRE for loads. It's a
> known deficiency and one that does not require any duplicate heuristics to
> solve.
> Right now we perform some availability based removal that happens be
> willing to do an insertion here or there to try to move a load in some
> cases (it will only do a single insertion).  It's not also based on real
> availability, but on finding nearest dependencies of loads and squinting at
> them (which is not even really the same as finding where a load stops being
> available) funny.
>
>  It's also not real PRE, just some code that is willing to insert one
> load to move a load.
>
> Er, I think we have a terminology problem here.  From what I can gather,
> our current implementation *is* an implementation of classic PRE.
>

Scalar yes, load PRE, no.

What load PRE does is not the same as availability.
It finds the nearest dependencies (be they load/load, store/load, or
clobber) for a load walking backwards.
That is not availability.

Availability is a forward problem, not a backwards one :)

It tries to transform the data it gets from memdep into something like
availability info, but you *will not get the same results* in some cases.

  It's more or less directly organized around the classic "available" and
> "anticipated" concepts from the literature.  The *implementation* has some
> limitations, but the conceptual framework is there*.
>

I agree the anticipation calculation is somewhat there, but even that is
not a standard PRE one.

> One of the key invariants in classic PRE is that you do not change program
> behaviour on any path.
>
Well, it's more "do not increase the number of computations on any path",
but even that gets violated in most standard PRE's because they do LICM out
of loops.

> As a result, moving (or at most duplicating) loads is pretty much all that
> happens.
>

Sure.

>
> * It took me a few hours of squinting at the code to recognize that weird
> backwards walk as being an implementation of "anctipated", but it is.  The
> "available" part comes from MemoryDependenceAnalysis and the forward
> dataflow algorithm based on those inputs.
>

memorydependencyanalysis is a backwards walker, it does not compute
availability of a load.
It computes the nearest dependencies.  We then try to take this and turn it
into the result of a forward availability problem.

>
> If you want to start inserting loads on paths which didn't previously have
> them - for example the loop preheader in your example in the previous email
> - this is not classic PRE as I'm familiar with it.  It can be very
> profitable and worthwhile, but is not classic PRE.   I'm not sure what this
> variant is known as in the literature.
>

This is definitely a standard SSA based PRE algorithm. Every single one i
am familiar with will all move code out of loops like this, be it scalar or
loads, regardless of the fact that it does in fact, increase the number of
possible computations by one if the loop does not execute.

>
> (If I'm being overly pedantic here, or just flat out wrong, please say
> so.  I'm still exploring related material and am trying to categorize
> things in my own head.)
>
>
I do not believe you will find an SSA based  PRE that will not do the above
optimization by default.

I think we actually in reasonable agreement that we do need to move beyond
> 'classic PRE' as I called it above.  If you have specific suggestions,
> positive or negative examples, or other thoughts to share, I'd very much
> value the input.
>

>  If you want this case to work, what you need is a real Load PRE
> implementation, not one based on simple memdep based load moving.
>
>
>  Trying to hack this one up with profile guided code duplication, etc, to
> get it to kinda sorta work seems to me to be a bad idea.
>
> Not sure I really agree here, but I'm open to pushing as far as we can
> with static heuristics first.  I suspect that's likely to be more stable
> w.r.t. optimization effect and applicable to more cases.
>

So, LLVM has been through three PRE implementations.
It had an implementation of e-path PRE, an implementation of GVNPRE (that
only worked on scalar code i believe) a long long time ago, and then the
current one.

The current one is used because it's fast and catches a lot of cases.
The others were eliminated because they were essentially experimental
research-quality implementations, and were slow :)

Because the current PRE is not value based, and it really has no idea about
memory, you will run into limitations on loads pretty quickly (particularly
loads indexed by things like induction variables).
It already tries to work around some of them by doing some actual value
analysis of the load (and doing forwarding), but this only works on basic
cases.

GVNPRE can be implemented and made a lot faster than it was (In GCC, it's
actually one of the fastest high level optimization passes) with very
careful engineering, and would take care of all of these cases.

I'm also fairly positive you could take any good anticipation based PRE
algorithm you can find, and i could make it value based and still fast.

But in the end, pushing the current one seems like a bad idea to me because
it wasn't designed for what you are trying to do - get good PRE results,
and do good load PRE, and we know of better ways to do that.

All pushing it will do is push it out of the sweet spot it was designed for.

If you want to get into profille-guided placement or speculative PRE, it
involves solving min-cut problems on very large graphs in most cases.

The only implementation i'm aware of that is relatively sane is
http://webdocs.cs.ualberta.ca/~amaral/cascon/CDP05/slides/CDP05-horspool.pdf
and
https://dspace.library.uvic.ca/bitstream/handle/1828/292/Pereira_0336403_Thesis-Final_21Dec2007_.pdf?sequence=1

(This does not require solving min-cut problems)

In the end though, you are doing the work, so you get to decide what you
want to do :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150112/bc14e8d8/attachment.html>