[LLVMdev] [RFC] Defining Infinite Loops

Tue Jul 21 18:51:21 PDT 2015

----- Original Message -----

> From: "Philip Reames" <listmail at philipreames.com>
> To: "Chandler Carruth" <chandlerc at google.com>, "Hal Finkel"
> <hfinkel at anl.gov>
> Cc: "LLVM Dev" <llvmdev at cs.uiuc.edu>
> Sent: Friday, July 17, 2015 7:06:33 PM
> Subject: Re: [LLVMdev] [RFC] Defining Infinite Loops

> On 07/17/2015 02:03 AM, Chandler Carruth wrote:

> > On Thu, Jul 16, 2015 at 11:08 AM Hal Finkel < hfinkel at anl.gov >
> > wrote:
> 

> > > ----- Original Message -----
> > 
> 
> > > > From: "Chandler Carruth" < chandlerc at google.com >
> > 
> 
> > > > To: "Hal Finkel" < hfinkel at anl.gov >
> > 
> 
> > > > Cc: "LLVM Dev" < llvmdev at cs.uiuc.edu >
> > 
> 
> > > > Sent: Thursday, July 16, 2015 2:33:21 AM
> > 
> 
> > > > Subject: Re: [LLVMdev] [RFC] Defining Infinite Loops
> > 
> 
> > > >
> > 
> 
> > > >
> > 
> 
> > > >
> > 
> 
> > > >
> > 
> 
> > > > On Thu, Jul 16, 2015 at 12:27 AM Hal Finkel < hfinkel at anl.gov >
> > 
> 
> > > > wrote:
> > 
> 
> > > >
> > 
> 
> > > >
> > 
> 
> > > > ----- Original Message -----
> > 
> 

> > > > > From: "Chandler Carruth" < chandlerc at google.com >
> > 
> 
> > > > > To: "Hal Finkel" < hfinkel at anl.gov >, "LLVM Dev" <
> > 
> 
> > > > > llvmdev at cs.uiuc.edu >
> > 
> 
> > > > > Sent: Thursday, July 16, 2015 1:00:05 AM
> > 
> 
> > > > > Subject: Re: [LLVMdev] [RFC] Defining Infinite Loops
> > 
> 
> > > > >
> > 
> 
> > > > >
> > 
> 
> > > > > FWIW, I'm very much in favor of having a firm and clear
> > > > > answer
> > > > > to
> > 
> 
> > > > > these questions.
> > 
> 
> > > > >
> > 
> 
> > > > > I also agree that it is an absolute requirement that LLVM
> > > > > have
> > 
> 
> > > > > *some*
> > 
> 
> > > > > mechanism for supporting both languages with defined behavior
> > > > > for
> > 
> 
> > > > > infinite loops and a language requirement that all loops
> > > > > terminate.
> > 
> 
> > > > >
> > 
> 
> > > > >
> > 
> 
> > > > > However, I'd like to float an alternative approach. I've not
> > > > > spent
> > 
> 
> > > > > a
> > 
> 
> > > > > lot of time thinking about it, so I'm not sure its actually
> > > > > better.
> > 
> 
> > > > > I'm wondering if you've already thought about it.
> > 
> 
> > > > >
> > 
> 
> > > > >
> > 
> 
> > > > > What if we have an @llvm.noop.sideeffect() or some such which
> > 
> 
> > > > > doesn't
> > 
> 
> > > > > read or write memory in any way, but which a frontend can
> > > > > place
> > 
> 
> > > > > inside a loop body to mark that its execution (perhaps
> > > > > infinitely)
> > 
> 
> > > > > is in-and-of-itself a side effect of the program. We could
> > > > > then
> > 
> 
> > > > > teach loop unrolling or the few other things that would care
> > > > > to
> > 
> 
> > > > > collapse multiple ones into a single one, and not count them
> > 
> 
> > > > > towards
> > 
> 
> > > > > anything.
> > 
> 
> > > > >
> > 
> 
> > > > >
> > 
> 
> > > > > I know an intrinsic is kind of awkward for this, but it seems
> > > > > like
> > 
> 
> > > > > the least bad way we have to annotate a loop in a fully
> > > > > generic
> > 
> 
> > > > > way.
> > 
> 
> > > > > I'd somewhat like this to be a property of the *loop* and not
> > > > > of
> > 
> 
> > > > > the
> > 
> 
> > > > > function. And it really needs to be truly generic, handling
> > 
> 
> > > > > unnatural CFGs and even recursion-loops. My only idea for how
> > > > > to
> > 
> 
> > > > > accomplish that is an intrinsic to mark the dynamic path
> > > > > which
> > > > > if
> > 
> 
> > > > > executed infinitely can't be eliminated.
> > 
> 
> > > >
> > 
> 
> > > > My largest concern is that the frontend would need to add these
> > 
> 
> > > > things all over the place, not just before the loop backedges.
> > > > For
> > 
> 
> > > > one thing, if the language has gotos, where should they be
> > > > inserted?
> > 
> 
> > > >
> > 
> 
> > > >
> > 
> 
> > > > The target of every goto.
> > 
> 
> > > >
> > 
> 
> > > >
> > 
> 
> > > > For computed goto, very label whose address is taken.
> > 
> 
> > > >
> > 
> 
> > > >
> > 
> 
> > > > This at least doesn't seem that bad to me.
> > 
> 
> > > >
> > 
> 
> > > >
> > 
> 
> > > > Before every branch will be conservatively correct, but I'm
> > > > worried
> > 
> 
> > > > that will unnecessarily block optimizations. They'd also be
> > > > needed
> > 
> 
> > > > at the entry to every function.
> > 
> 
> > > >
> > 
> 
> > > >
> > 
> 
> > > > Only external, address taken, or
> > > > internal-and-recursively-called
> > 
> 
> > > > functions. All of which we already have some blockers to
> > 
> 
> > > > optimization, so this part i'm not worried about.
> > 
> 
> > > >
> > 
> 
> > > >
> > 
> 
> > > > On the other hand, maybe if we added an optimization that
> > > > removed
> > 
> 
> > > > these things along any control-flow path that already had any
> > > > other
> > 
> 
> > > > side effect, it might not be too bad?
> > 
> 
> > > >
> > 
> 
> > > >
> > 
> 
> > > >
> > 
> 
> > > > Absolutely, much like lower-expect, we'd need to make sure that
> > > > easy
> > 
> 
> > > > cases were folded quickly in the optimizer so this didn't get
> > > > out
> > > > of
> > 
> 
> > > > hand.
> > 
> 
> > > >
> > 
> 
> > > >
> > 
> 
> > > >
> > 
> 
> > > > >
> > 
> 
> > > > >
> > 
> 
> > > > > As for why I'm particularly interested in this being a
> > > > > property
> > > > > of
> > 
> 
> > > > > the loop, consider if you were to have a mixture of Java and
> > > > > C++
> > 
> 
> > > > > code, all compiled to LLVM. How do you inline between them?
> > 
> 
> > > > >
> > 
> 
> > > >
> > 
> 
> > > > You add the attribute to the caller.
> > 
> 
> > > >
> > 
> 
> > > >
> > 
> 
> > > > This has the really unfortunate side effect of pessimizing code
> > 
> 
> > > > during cross language optimizations.
> > 
> 
> > > >
> > 
> 
> > > >
> > 
> 
> > > > FWIW, I suspect I might care a lot about this particular case
> > 
> 
> > > > (because I believe that Fortran has defined behavior for
> > > > infinite
> > 
> 
> > > > loops).
> > 
> 
> > > >
> > 
> 
> > > >
> > 
> 
> > > >
> > 
> 
> > > > Yea, you could argue that C does too, which is one reason why
> > > > I'm
> > > > so
> > 
> 
> > > > interested in this being done really well even in an LTO
> > > > situation.
> > 
> 
> > > >
> > 
> 
> > > >
> > 
> 
> > > > I think it would be really useful to not have this cross
> > > > between
> > 
> 
> > > > adjacent loops after inlining when they come from different
> > > > source
> > 
> 
> > > > languages, and it would be nice for it to not apply to nested
> > > > loops
> > 
> 
> > > > when those nested loops were inlined from a language without
> > > > this
> > 
> 
> > > > guarantee.
> > 
> 
> > > >
> > 
> 
> > > >
> > 
> 
> > > > But I'm still not convinced that the noise of the intrinsic is
> > 
> 
> > > > *definitely* worth it. I come from the background of the C++
> > > > rules'
> > 
> 
> > > > rationale, and so I naturally see the languages that define
> > > > this
> > > > as
> > 
> 
> > > > giving up optimizations and so wanting to localize the impact
> > > > of
> > 
> 
> > > > that... Not sure that's actually the right perspective though.
> > > > ;]
> > 
> 
> > > >
> > 
> 

> > > I'm leaning toward agreeing with you, primarily because I think
> > > it
> > > will more-naturally fit into the optimizer than the attribute. We
> > > need to check loops for side effects anyway (if we otherwise
> > > default
> > > to C++-like rules), and so this intrinsic will do the right thing
> > > without any special logic.
> > 
> 

> > FWIW, this is why I first started thinking along these lines. I'm
> > increasingly thinking that if this approach works it will make the
> > implementation of testing for this more natural in the optimizer.
> > Simple things like instruction predicates will "just work", etc.
> 

> > I'm really interested in the downsides. You mentioned a few
> > potential
> > ones, but seem to be happy with my responses. I wonder, are there
> > others?
> 
> I'm really not a fan of this approach. I think it could be made to
> work, but I suspect we'd run into a lot of quality of implementation
> issues if we went down this route.

> We'd have to teach many places in the optimizer to merge or split
> such calls. Consider simple things like tail commoning, if-then-else
> hoisting, or the like. In each of these, we'd need code to recognize
> having the intrinsic on one path, but not the other, and then still
> perform the optimization. Think the various debug intrinsics, but
> worse.

> I worry about the interactions with memory aliasing and hoisting
> rules. We don't currently have the notion of a non-memory side
> effect in LLVM. To prevent this intrinsic from being DCE'd we'd
> likely need to mark it as writing to some known location.
We'd do the same thing we did with @llvm.assume, for which we have all of the same problems (including those mentioned above, only worse). For @llvm.assume we tag it as writing to an unknown location, but taught BasicAA that is does not alias with any particular location. 

> Given the number of places in the optimizer which give up when
> encountering any write (EarlyCSE for one), that would be problematic
> for optimization effectiveness. The other approach would be to teach
> LLVM about non-memory side effects. I think this is solvable, but a
> large investment.

EarlyCSE has specific code to deal with @llvm.assume as well (four lines of code, but code nevertheless). 

> In practice, most of the contributors to LLVM care about C++. I worry
> we'd end up in a situation where languages which need infinite loops
> would become second class citizens and that a large number of
> optimizations wouldn't apply to them in practice. This is by far my
> biggest worry.

> Now, I'm certainly biased, but I'd much rather see a scheme where a
> quality of implementation issue effects the performance of C++.
> These are far more likely to be actually fixed. :)

> Earlier in this thread, the idea of using metadata on loops was
> mentioned. Chandler's point about generic handling for recursive
> loops is a good one, but in general, a metadata annotation of
> finiteness seems like a better starting point.

> What if we introduced a piece of branch (really, control transfer
> instruction) metadata (not loop) called "productive" (to steal
> Sanjoy's term) whose semantics are such that it can be assumed to
> only execute a finite number of times between productive actions
> (volatile, synchronization, io, etc..). We then tag *all* branches
> emitted by Clang with this metadata. This gives us the benefit of
> the loop metadata in that a single tagged backedge branch implies a
> productive loop, but allows productive recursive functions to be
> converted into productive loops in a natural way.

I don't see how this helps us with the recursive functions case. If I have this: 

void foo() { 
bar(); 
foo(); 
} 

where do I put the metadata? Maybe we could also tag the 'ret' as well? 

> The function attribute "productive" now has an easy inference rule in
> that if all branches in the function are productive and all callees
> are productive, so is the function. This would seem to allow us to
> perform DSE, LICM, and related optimizations without trouble.

> Inlining now has a reasonable definition where you can inline between
> languages w/the semantics of each language preserved.

> One cool thing here is that the set of operations which are
> "productive" could actually be encoded in the metadata. This could
> potentially support other languages than C++ w.r.t. the "no infinite
> loops except when" type rules.

> Thoughts?

I was at first afraid of the number of places that create branch instructions, and updating them all to preserve the metadata when appropriate. However: 

$ grep -r 'BranchInst::Create\|CreateBr' lib/Transforms | wc -l 
99 
$ grep -r 'BranchInst::Create\|CreateBr' lib/Transforms -l | wc -l 
33 

And so there are only 99 places in 33 files that create branches in lib/Transforms (and I find the numerical coincidence amusing). So this seems potentially doable. 

Also, if I'm right and we'd need them on 'ret' as well: 

$ grep -r 'ReturnInst::Create\|CreateRet' lib/Transforms | wc -l 
33 
$ grep -r 'ReturnInst::Create\|CreateRet' lib/Transforms -l | wc -l 
11 

And now I'm even more amused. 

Chandler, what do you think? 

-Hal 

> Philip

> p.s. The current implementation of readonly is correct w.r.t. C++
> rules only because we designate all atomic, volatile, or fencing
> operations as read/write. One thing we might end up with out of this
> discussion is the possibility of talking about functions which are
> readonly, but involved in synchronization. That seems like it might
> be useful for optimizing concurrent C++ programs in interesting
> ways. It would require a separate notion of a synchronization side
> effect independent of read/write though. This seems related, but
> slightly orthogonal to the assumptions about finiteness of loops.

I agree; we do need some notion of side effects other than based on memory dependencies. This, however, is a (hopefully) separate project. 

Thanks again, 
Hal 

-- 

Hal Finkel 
Assistant Computational Scientist 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150721/0ec14927/attachment.html>