[LLVMdev] Separating loop nests based on profile information?

Mon Jan 12 10:35:58 PST 2015

> On Jan 12, 2015, at 10:08 AM, Philip Reames <listmail at philipreames.com> wrote:
> 
> 
> On 01/11/2015 10:35 PM, Daniel Berlin wrote:
>> 
>> Looking at what LLVM does, the failing on the PRE side is that our PRE/GVN models are not strong enough to handle this. I'm not sure at all why we think anything else is necessary. 
>> 
>> By this i mean we don't actually perform real PRE for loads. It's a known deficiency and one that does not require any duplicate heuristics to solve.
>> Right now we perform some availability based removal that happens be willing to do an insertion here or there to try to move a load in some cases (it will only do a single insertion).  It's not also based on real availability, but on finding nearest dependencies of loads and squinting at them (which is not even really the same as finding where a load stops being available) funny.
>> 
>> It's also not real PRE, just some code that is willing to insert one load to move a load.
> Er, I think we have a terminology problem here.  From what I can gather, our current implementation *is* an implementation of classic PRE.  It's more or less directly organized around the classic "available" and "anticipated" concepts from the literature.  The *implementation* has some limitations, but the conceptual framework is there*.  One of the key invariants in classic PRE is that you do not change program behaviour on any path.  As a result, moving (or at most duplicating) loads is pretty much all that happens.
> 
> * It took me a few hours of squinting at the code to recognize that weird backwards walk as being an implementation of "anctipated", but it is. 

Did you add some comments? It may save a few hours of squinting to someone else in the future :)

Mehdi

> The "available" part comes from MemoryDependenceAnalysis and the forward dataflow algorithm based on those inputs.
> 
> If you want to start inserting loads on paths which didn't previously have them - for example the loop preheader in your example in the previous email - this is not classic PRE as I'm familiar with it.  It can be very profitable and worthwhile, but is not classic PRE.   I'm not sure what this variant is known as in the literature.  
> 
> (If I'm being overly pedantic here, or just flat out wrong, please say so.  I'm still exploring related material and am trying to categorize things in my own head.)
> 
> I think we actually in reasonable agreement that we do need to move beyond 'classic PRE' as I called it above.  If you have specific suggestions, positive or negative examples, or other thoughts to share, I'd very much value the input.  
>> 
>> If you want this case to work, what you need is a real Load PRE implementation, not one based on simple memdep based load moving.
>> 
>> Trying to hack this one up with profile guided code duplication, etc, to get it to kinda sorta work seems to me to be a bad idea.
> Not sure I really agree here, but I'm open to pushing as far as we can with static heuristics first.  I suspect that's likely to be more stable w.r.t. optimization effect and applicable to more cases.  
>>   
>> It's certainly not requiring special code duplication heuristics, etc.
>> 
>> So either you are thinking of a case that is different from the above, or I am seriously confused :)
>> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150112/97022d24/attachment.html>