[LLVMdev] [RFC] Defining Infinite Loops
Hal Finkel
hfinkel at anl.gov
Tue Jul 21 18:51:21 PDT 2015
----- Original Message -----
> From: "Philip Reames" <listmail at philipreames.com>
> To: "Chandler Carruth" <chandlerc at google.com>, "Hal Finkel"
> <hfinkel at anl.gov>
> Cc: "LLVM Dev" <llvmdev at cs.uiuc.edu>
> Sent: Friday, July 17, 2015 7:06:33 PM
> Subject: Re: [LLVMdev] [RFC] Defining Infinite Loops
> On 07/17/2015 02:03 AM, Chandler Carruth wrote:
> > On Thu, Jul 16, 2015 at 11:08 AM Hal Finkel < hfinkel at anl.gov >
> > wrote:
>
> > > ----- Original Message -----
> >
>
> > > > From: "Chandler Carruth" < chandlerc at google.com >
> >
>
> > > > To: "Hal Finkel" < hfinkel at anl.gov >
> >
>
> > > > Cc: "LLVM Dev" < llvmdev at cs.uiuc.edu >
> >
>
> > > > Sent: Thursday, July 16, 2015 2:33:21 AM
> >
>
> > > > Subject: Re: [LLVMdev] [RFC] Defining Infinite Loops
> >
>
> > > >
> >
>
> > > >
> >
>
> > > >
> >
>
> > > >
> >
>
> > > > On Thu, Jul 16, 2015 at 12:27 AM Hal Finkel < hfinkel at anl.gov >
> >
>
> > > > wrote:
> >
>
> > > >
> >
>
> > > >
> >
>
> > > > ----- Original Message -----
> >
>
> > > > > From: "Chandler Carruth" < chandlerc at google.com >
> >
>
> > > > > To: "Hal Finkel" < hfinkel at anl.gov >, "LLVM Dev" <
> >
>
> > > > > llvmdev at cs.uiuc.edu >
> >
>
> > > > > Sent: Thursday, July 16, 2015 1:00:05 AM
> >
>
> > > > > Subject: Re: [LLVMdev] [RFC] Defining Infinite Loops
> >
>
> > > > >
> >
>
> > > > >
> >
>
> > > > > FWIW, I'm very much in favor of having a firm and clear
> > > > > answer
> > > > > to
> >
>
> > > > > these questions.
> >
>
> > > > >
> >
>
> > > > > I also agree that it is an absolute requirement that LLVM
> > > > > have
> >
>
> > > > > *some*
> >
>
> > > > > mechanism for supporting both languages with defined behavior
> > > > > for
> >
>
> > > > > infinite loops and a language requirement that all loops
> > > > > terminate.
> >
>
> > > > >
> >
>
> > > > >
> >
>
> > > > > However, I'd like to float an alternative approach. I've not
> > > > > spent
> >
>
> > > > > a
> >
>
> > > > > lot of time thinking about it, so I'm not sure its actually
> > > > > better.
> >
>
> > > > > I'm wondering if you've already thought about it.
> >
>
> > > > >
> >
>
> > > > >
> >
>
> > > > > What if we have an @llvm.noop.sideeffect() or some such which
> >
>
> > > > > doesn't
> >
>
> > > > > read or write memory in any way, but which a frontend can
> > > > > place
> >
>
> > > > > inside a loop body to mark that its execution (perhaps
> > > > > infinitely)
> >
>
> > > > > is in-and-of-itself a side effect of the program. We could
> > > > > then
> >
>
> > > > > teach loop unrolling or the few other things that would care
> > > > > to
> >
>
> > > > > collapse multiple ones into a single one, and not count them
> >
>
> > > > > towards
> >
>
> > > > > anything.
> >
>
> > > > >
> >
>
> > > > >
> >
>
> > > > > I know an intrinsic is kind of awkward for this, but it seems
> > > > > like
> >
>
> > > > > the least bad way we have to annotate a loop in a fully
> > > > > generic
> >
>
> > > > > way.
> >
>
> > > > > I'd somewhat like this to be a property of the *loop* and not
> > > > > of
> >
>
> > > > > the
> >
>
> > > > > function. And it really needs to be truly generic, handling
> >
>
> > > > > unnatural CFGs and even recursion-loops. My only idea for how
> > > > > to
> >
>
> > > > > accomplish that is an intrinsic to mark the dynamic path
> > > > > which
> > > > > if
> >
>
> > > > > executed infinitely can't be eliminated.
> >
>
> > > >
> >
>
> > > > My largest concern is that the frontend would need to add these
> >
>
> > > > things all over the place, not just before the loop backedges.
> > > > For
> >
>
> > > > one thing, if the language has gotos, where should they be
> > > > inserted?
> >
>
> > > >
> >
>
> > > >
> >
>
> > > > The target of every goto.
> >
>
> > > >
> >
>
> > > >
> >
>
> > > > For computed goto, very label whose address is taken.
> >
>
> > > >
> >
>
> > > >
> >
>
> > > > This at least doesn't seem that bad to me.
> >
>
> > > >
> >
>
> > > >
> >
>
> > > > Before every branch will be conservatively correct, but I'm
> > > > worried
> >
>
> > > > that will unnecessarily block optimizations. They'd also be
> > > > needed
> >
>
> > > > at the entry to every function.
> >
>
> > > >
> >
>
> > > >
> >
>
> > > > Only external, address taken, or
> > > > internal-and-recursively-called
> >
>
> > > > functions. All of which we already have some blockers to
> >
>
> > > > optimization, so this part i'm not worried about.
> >
>
> > > >
> >
>
> > > >
> >
>
> > > > On the other hand, maybe if we added an optimization that
> > > > removed
> >
>
> > > > these things along any control-flow path that already had any
> > > > other
> >
>
> > > > side effect, it might not be too bad?
> >
>
> > > >
> >
>
> > > >
> >
>
> > > >
> >
>
> > > > Absolutely, much like lower-expect, we'd need to make sure that
> > > > easy
> >
>
> > > > cases were folded quickly in the optimizer so this didn't get
> > > > out
> > > > of
> >
>
> > > > hand.
> >
>
> > > >
> >
>
> > > >
> >
>
> > > >
> >
>
> > > > >
> >
>
> > > > >
> >
>
> > > > > As for why I'm particularly interested in this being a
> > > > > property
> > > > > of
> >
>
> > > > > the loop, consider if you were to have a mixture of Java and
> > > > > C++
> >
>
> > > > > code, all compiled to LLVM. How do you inline between them?
> >
>
> > > > >
> >
>
> > > >
> >
>
> > > > You add the attribute to the caller.
> >
>
> > > >
> >
>
> > > >
> >
>
> > > > This has the really unfortunate side effect of pessimizing code
> >
>
> > > > during cross language optimizations.
> >
>
> > > >
> >
>
> > > >
> >
>
> > > > FWIW, I suspect I might care a lot about this particular case
> >
>
> > > > (because I believe that Fortran has defined behavior for
> > > > infinite
> >
>
> > > > loops).
> >
>
> > > >
> >
>
> > > >
> >
>
> > > >
> >
>
> > > > Yea, you could argue that C does too, which is one reason why
> > > > I'm
> > > > so
> >
>
> > > > interested in this being done really well even in an LTO
> > > > situation.
> >
>
> > > >
> >
>
> > > >
> >
>
> > > > I think it would be really useful to not have this cross
> > > > between
> >
>
> > > > adjacent loops after inlining when they come from different
> > > > source
> >
>
> > > > languages, and it would be nice for it to not apply to nested
> > > > loops
> >
>
> > > > when those nested loops were inlined from a language without
> > > > this
> >
>
> > > > guarantee.
> >
>
> > > >
> >
>
> > > >
> >
>
> > > > But I'm still not convinced that the noise of the intrinsic is
> >
>
> > > > *definitely* worth it. I come from the background of the C++
> > > > rules'
> >
>
> > > > rationale, and so I naturally see the languages that define
> > > > this
> > > > as
> >
>
> > > > giving up optimizations and so wanting to localize the impact
> > > > of
> >
>
> > > > that... Not sure that's actually the right perspective though.
> > > > ;]
> >
>
> > > >
> >
>
> > > I'm leaning toward agreeing with you, primarily because I think
> > > it
> > > will more-naturally fit into the optimizer than the attribute. We
> > > need to check loops for side effects anyway (if we otherwise
> > > default
> > > to C++-like rules), and so this intrinsic will do the right thing
> > > without any special logic.
> >
>
> > FWIW, this is why I first started thinking along these lines. I'm
> > increasingly thinking that if this approach works it will make the
> > implementation of testing for this more natural in the optimizer.
> > Simple things like instruction predicates will "just work", etc.
>
> > I'm really interested in the downsides. You mentioned a few
> > potential
> > ones, but seem to be happy with my responses. I wonder, are there
> > others?
>
> I'm really not a fan of this approach. I think it could be made to
> work, but I suspect we'd run into a lot of quality of implementation
> issues if we went down this route.
> We'd have to teach many places in the optimizer to merge or split
> such calls. Consider simple things like tail commoning, if-then-else
> hoisting, or the like. In each of these, we'd need code to recognize
> having the intrinsic on one path, but not the other, and then still
> perform the optimization. Think the various debug intrinsics, but
> worse.
> I worry about the interactions with memory aliasing and hoisting
> rules. We don't currently have the notion of a non-memory side
> effect in LLVM. To prevent this intrinsic from being DCE'd we'd
> likely need to mark it as writing to some known location.
We'd do the same thing we did with @llvm.assume, for which we have all of the same problems (including those mentioned above, only worse). For @llvm.assume we tag it as writing to an unknown location, but taught BasicAA that is does not alias with any particular location.
> Given the number of places in the optimizer which give up when
> encountering any write (EarlyCSE for one), that would be problematic
> for optimization effectiveness. The other approach would be to teach
> LLVM about non-memory side effects. I think this is solvable, but a
> large investment.
EarlyCSE has specific code to deal with @llvm.assume as well (four lines of code, but code nevertheless).
> In practice, most of the contributors to LLVM care about C++. I worry
> we'd end up in a situation where languages which need infinite loops
> would become second class citizens and that a large number of
> optimizations wouldn't apply to them in practice. This is by far my
> biggest worry.
> Now, I'm certainly biased, but I'd much rather see a scheme where a
> quality of implementation issue effects the performance of C++.
> These are far more likely to be actually fixed. :)
> Earlier in this thread, the idea of using metadata on loops was
> mentioned. Chandler's point about generic handling for recursive
> loops is a good one, but in general, a metadata annotation of
> finiteness seems like a better starting point.
> What if we introduced a piece of branch (really, control transfer
> instruction) metadata (not loop) called "productive" (to steal
> Sanjoy's term) whose semantics are such that it can be assumed to
> only execute a finite number of times between productive actions
> (volatile, synchronization, io, etc..). We then tag *all* branches
> emitted by Clang with this metadata. This gives us the benefit of
> the loop metadata in that a single tagged backedge branch implies a
> productive loop, but allows productive recursive functions to be
> converted into productive loops in a natural way.
I don't see how this helps us with the recursive functions case. If I have this:
void foo() {
bar();
foo();
}
where do I put the metadata? Maybe we could also tag the 'ret' as well?
> The function attribute "productive" now has an easy inference rule in
> that if all branches in the function are productive and all callees
> are productive, so is the function. This would seem to allow us to
> perform DSE, LICM, and related optimizations without trouble.
> Inlining now has a reasonable definition where you can inline between
> languages w/the semantics of each language preserved.
> One cool thing here is that the set of operations which are
> "productive" could actually be encoded in the metadata. This could
> potentially support other languages than C++ w.r.t. the "no infinite
> loops except when" type rules.
> Thoughts?
I was at first afraid of the number of places that create branch instructions, and updating them all to preserve the metadata when appropriate. However:
$ grep -r 'BranchInst::Create\|CreateBr' lib/Transforms | wc -l
99
$ grep -r 'BranchInst::Create\|CreateBr' lib/Transforms -l | wc -l
33
And so there are only 99 places in 33 files that create branches in lib/Transforms (and I find the numerical coincidence amusing). So this seems potentially doable.
Also, if I'm right and we'd need them on 'ret' as well:
$ grep -r 'ReturnInst::Create\|CreateRet' lib/Transforms | wc -l
33
$ grep -r 'ReturnInst::Create\|CreateRet' lib/Transforms -l | wc -l
11
And now I'm even more amused.
Chandler, what do you think?
-Hal
> Philip
> p.s. The current implementation of readonly is correct w.r.t. C++
> rules only because we designate all atomic, volatile, or fencing
> operations as read/write. One thing we might end up with out of this
> discussion is the possibility of talking about functions which are
> readonly, but involved in synchronization. That seems like it might
> be useful for optimizing concurrent C++ programs in interesting
> ways. It would require a separate notion of a synchronization side
> effect independent of read/write though. This seems related, but
> slightly orthogonal to the assumptions about finiteness of loops.
I agree; we do need some notion of side effects other than based on memory dependencies. This, however, is a (hopefully) separate project.
Thanks again,
Hal
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150721/0ec14927/attachment.html>
More information about the llvm-dev
mailing list