[LLVMdev] Separating loop nests based on profile information?
Philip Reames
listmail at philipreames.com
Mon Jan 12 10:08:49 PST 2015
On 01/11/2015 10:35 PM, Daniel Berlin wrote:
>
>
> Looking at what LLVM does, the failing on the PRE side is that our
> PRE/GVN models are not strong enough to handle this. I'm not sure
> at all why we think anything else is necessary.
>
>
> By this i mean we don't actually perform real PRE for loads. It's a
> known deficiency and one that does not require any duplicate
> heuristics to solve.
> Right now we perform some availability based removal that happens be
> willing to do an insertion here or there to try to move a load in some
> cases (it will only do a single insertion). It's not also based on
> real availability, but on finding nearest dependencies of loads and
> squinting at them (which is not even really the same as finding where
> a load stops being available) funny.
>
> It's also not real PRE, just some code that is willing to insert one
> load to move a load.
Er, I think we have a terminology problem here. From what I can gather,
our current implementation *is* an implementation of classic PRE. It's
more or less directly organized around the classic "available" and
"anticipated" concepts from the literature. The *implementation* has
some limitations, but the conceptual framework is there*. One of the
key invariants in classic PRE is that you do not change program
behaviour on any path. As a result, moving (or at most duplicating)
loads is pretty much all that happens.
* It took me a few hours of squinting at the code to recognize that
weird backwards walk as being an implementation of "anctipated", but it
is. The "available" part comes from MemoryDependenceAnalysis and the
forward dataflow algorithm based on those inputs.
If you want to start inserting loads on paths which didn't previously
have them - for example the loop preheader in your example in the
previous email - this is not classic PRE as I'm familiar with it. It
can be very profitable and worthwhile, but is not classic PRE. I'm not
sure what this variant is known as in the literature.
(If I'm being overly pedantic here, or just flat out wrong, please say
so. I'm still exploring related material and am trying to categorize
things in my own head.)
I think we actually in reasonable agreement that we do need to move
beyond 'classic PRE' as I called it above. If you have specific
suggestions, positive or negative examples, or other thoughts to share,
I'd very much value the input.
>
> If you want this case to work, what you need is a real Load PRE
> implementation, not one based on simple memdep based load moving.
>
> Trying to hack this one up with profile guided code duplication, etc,
> to get it to kinda sorta work seems to me to be a bad idea.
Not sure I really agree here, but I'm open to pushing as far as we can
with static heuristics first. I suspect that's likely to be more stable
w.r.t. optimization effect and applicable to more cases.
>
> It's certainly not requiring special code duplication heuristics, etc.
>
> So either you are thinking of a case that is different from the
> above, or I am seriously confused :)
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150112/4240fd75/attachment.html>
More information about the llvm-dev
mailing list