[LLVMdev] Separating loop nests based on profile information?

Mon Jan 12 10:08:49 PST 2015

On 01/11/2015 10:35 PM, Daniel Berlin wrote:
>
>
>     Looking at what LLVM does, the failing on the PRE side is that our
>     PRE/GVN models are not strong enough to handle this. I'm not sure
>     at all why we think anything else is necessary.
>
>
> By this i mean we don't actually perform real PRE for loads. It's a 
> known deficiency and one that does not require any duplicate 
> heuristics to solve.
> Right now we perform some availability based removal that happens be 
> willing to do an insertion here or there to try to move a load in some 
> cases (it will only do a single insertion).  It's not also based on 
> real availability, but on finding nearest dependencies of loads and 
> squinting at them (which is not even really the same as finding where 
> a load stops being available) funny.
>
> It's also not real PRE, just some code that is willing to insert one 
> load to move a load.
Er, I think we have a terminology problem here.  From what I can gather, 
our current implementation *is* an implementation of classic PRE.  It's 
more or less directly organized around the classic "available" and 
"anticipated" concepts from the literature.  The *implementation* has 
some limitations, but the conceptual framework is there*.  One of the 
key invariants in classic PRE is that you do not change program 
behaviour on any path.  As a result, moving (or at most duplicating) 
loads is pretty much all that happens.

* It took me a few hours of squinting at the code to recognize that 
weird backwards walk as being an implementation of "anctipated", but it 
is.  The "available" part comes from MemoryDependenceAnalysis and the 
forward dataflow algorithm based on those inputs.

If you want to start inserting loads on paths which didn't previously 
have them - for example the loop preheader in your example in the 
previous email - this is not classic PRE as I'm familiar with it.  It 
can be very profitable and worthwhile, but is not classic PRE.   I'm not 
sure what this variant is known as in the literature.

(If I'm being overly pedantic here, or just flat out wrong, please say 
so.  I'm still exploring related material and am trying to categorize 
things in my own head.)

I think we actually in reasonable agreement that we do need to move 
beyond 'classic PRE' as I called it above.  If you have specific 
suggestions, positive or negative examples, or other thoughts to share, 
I'd very much value the input.
>
> If you want this case to work, what you need is a real Load PRE 
> implementation, not one based on simple memdep based load moving.
>
> Trying to hack this one up with profile guided code duplication, etc, 
> to get it to kinda sorta work seems to me to be a bad idea.
Not sure I really agree here, but I'm open to pushing as far as we can 
with static heuristics first.  I suspect that's likely to be more stable 
w.r.t. optimization effect and applicable to more cases.
>
>     It's certainly not requiring special code duplication heuristics, etc.
>
>     So either you are thinking of a case that is different from the
>     above, or I am seriously confused :)
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150112/4240fd75/attachment.html>