<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<br>
<br>
<div class="moz-cite-prefix">On 07/17/2015 02:03 AM, Chandler
Carruth wrote:<br>
</div>
<blockquote
cite="mid:CAGCO0Kj9_ufibPrOetLr6-5ev0_DAFTnK8uA+6tmadx3y9qp6Q@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div dir="ltr">On Thu, Jul 16, 2015 at 11:08 AM Hal Finkel
<<a moz-do-not-send="true" href="mailto:hfinkel@anl.gov">hfinkel@anl.gov</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">-----
Original Message -----<br>
> From: "Chandler Carruth" <<a moz-do-not-send="true"
href="mailto:chandlerc@google.com" target="_blank">chandlerc@google.com</a>><br>
> To: "Hal Finkel" <<a moz-do-not-send="true"
href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>><br>
> Cc: "LLVM Dev" <<a moz-do-not-send="true"
href="mailto:llvmdev@cs.uiuc.edu" target="_blank">llvmdev@cs.uiuc.edu</a>><br>
> Sent: Thursday, July 16, 2015 2:33:21 AM<br>
> Subject: Re: [LLVMdev] [RFC] Defining Infinite Loops<br>
><br>
><br>
><br>
><br>
> On Thu, Jul 16, 2015 at 12:27 AM Hal Finkel < <a
moz-do-not-send="true" href="mailto:hfinkel@anl.gov"
target="_blank">hfinkel@anl.gov</a> ><br>
> wrote:<br>
><br>
><br>
> ----- Original Message -----<br>
> > From: "Chandler Carruth" < <a
moz-do-not-send="true" href="mailto:chandlerc@google.com"
target="_blank">chandlerc@google.com</a> ><br>
> > To: "Hal Finkel" < <a moz-do-not-send="true"
href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>
>, "LLVM Dev" <<br>
> > <a moz-do-not-send="true"
href="mailto:llvmdev@cs.uiuc.edu" target="_blank">llvmdev@cs.uiuc.edu</a>
><br>
> > Sent: Thursday, July 16, 2015 1:00:05 AM<br>
> > Subject: Re: [LLVMdev] [RFC] Defining Infinite
Loops<br>
> ><br>
> ><br>
> > FWIW, I'm very much in favor of having a firm and
clear answer to<br>
> > these questions.<br>
> ><br>
> > I also agree that it is an absolute requirement
that LLVM have<br>
> > *some*<br>
> > mechanism for supporting both languages with
defined behavior for<br>
> > infinite loops and a language requirement that all
loops terminate.<br>
> ><br>
> ><br>
> > However, I'd like to float an alternative
approach. I've not spent<br>
> > a<br>
> > lot of time thinking about it, so I'm not sure its
actually better.<br>
> > I'm wondering if you've already thought about it.<br>
> ><br>
> ><br>
> > What if we have an @llvm.noop.sideeffect() or some
such which<br>
> > doesn't<br>
> > read or write memory in any way, but which a
frontend can place<br>
> > inside a loop body to mark that its execution
(perhaps infinitely)<br>
> > is in-and-of-itself a side effect of the program.
We could then<br>
> > teach loop unrolling or the few other things that
would care to<br>
> > collapse multiple ones into a single one, and not
count them<br>
> > towards<br>
> > anything.<br>
> ><br>
> ><br>
> > I know an intrinsic is kind of awkward for this,
but it seems like<br>
> > the least bad way we have to annotate a loop in a
fully generic<br>
> > way.<br>
> > I'd somewhat like this to be a property of the
*loop* and not of<br>
> > the<br>
> > function. And it really needs to be truly generic,
handling<br>
> > unnatural CFGs and even recursion-loops. My only
idea for how to<br>
> > accomplish that is an intrinsic to mark the
dynamic path which if<br>
> > executed infinitely can't be eliminated.<br>
><br>
> My largest concern is that the frontend would need to
add these<br>
> things all over the place, not just before the loop
backedges. For<br>
> one thing, if the language has gotos, where should they
be inserted?<br>
><br>
><br>
> The target of every goto.<br>
><br>
><br>
> For computed goto, very label whose address is taken.<br>
><br>
><br>
> This at least doesn't seem that bad to me.<br>
><br>
><br>
> Before every branch will be conservatively correct, but
I'm worried<br>
> that will unnecessarily block optimizations. They'd
also be needed<br>
> at the entry to every function.<br>
><br>
><br>
> Only external, address taken, or
internal-and-recursively-called<br>
> functions. All of which we already have some blockers
to<br>
> optimization, so this part i'm not worried about.<br>
><br>
><br>
> On the other hand, maybe if we added an optimization
that removed<br>
> these things along any control-flow path that already
had any other<br>
> side effect, it might not be too bad?<br>
><br>
><br>
><br>
> Absolutely, much like lower-expect, we'd need to make
sure that easy<br>
> cases were folded quickly in the optimizer so this
didn't get out of<br>
> hand.<br>
><br>
><br>
><br>
> ><br>
> ><br>
> > As for why I'm particularly interested in this
being a property of<br>
> > the loop, consider if you were to have a mixture
of Java and C++<br>
> > code, all compiled to LLVM. How do you inline
between them?<br>
> ><br>
><br>
> You add the attribute to the caller.<br>
><br>
><br>
> This has the really unfortunate side effect of
pessimizing code<br>
> during cross language optimizations.<br>
><br>
><br>
> FWIW, I suspect I might care a lot about this
particular case<br>
> (because I believe that Fortran has defined behavior
for infinite<br>
> loops).<br>
><br>
><br>
><br>
> Yea, you could argue that C does too, which is one
reason why I'm so<br>
> interested in this being done really well even in an
LTO situation.<br>
><br>
><br>
> I think it would be really useful to not have this
cross between<br>
> adjacent loops after inlining when they come from
different source<br>
> languages, and it would be nice for it to not apply to
nested loops<br>
> when those nested loops were inlined from a language
without this<br>
> guarantee.<br>
><br>
><br>
> But I'm still not convinced that the noise of the
intrinsic is<br>
> *definitely* worth it. I come from the background of
the C++ rules'<br>
> rationale, and so I naturally see the languages that
define this as<br>
> giving up optimizations and so wanting to localize the
impact of<br>
> that... Not sure that's actually the right perspective
though. ;]<br>
><br>
<br>
I'm leaning toward agreeing with you, primarily because I
think it will more-naturally fit into the optimizer than the
attribute. We need to check loops for side effects anyway
(if we otherwise default to C++-like rules), and so this
intrinsic will do the right thing without any special logic.<br>
</blockquote>
<div><br>
</div>
<div>FWIW, this is why I first started thinking along these
lines. I'm increasingly thinking that if this approach works
it will make the implementation of testing for this more
natural in the optimizer. Simple things like instruction
predicates will "just work", etc.</div>
<div><br>
</div>
<div>I'm really interested in the downsides. You mentioned a
few potential ones, but seem to be happy with my responses.
I wonder, are there others?</div>
</div>
</div>
</blockquote>
I'm really not a fan of this approach. I think it could be made to
work, but I suspect we'd run into a lot of quality of implementation
issues if we went down this route. <br>
<br>
We'd have to teach many places in the optimizer to merge or split
such calls. Consider simple things like tail commoning,
if-then-else hoisting, or the like. In each of these, we'd need
code to recognize having the intrinsic on one path, but not the
other, and then still perform the optimization. Think the various
debug intrinsics, but worse. <br>
<br>
I worry about the interactions with memory aliasing and hoisting
rules. We don't currently have the notion of a non-memory side
effect in LLVM. To prevent this intrinsic from being DCE'd we'd
likely need to mark it as writing to some known location. Given the
number of places in the optimizer which give up when encountering
any write (EarlyCSE for one), that would be problematic for
optimization effectiveness. The other approach would be to teach
LLVM about non-memory side effects. I think this is solvable, but a
large investment. <br>
<br>
In practice, most of the contributors to LLVM care about C++. I
worry we'd end up in a situation where languages which need infinite
loops would become second class citizens and that a large number of
optimizations wouldn't apply to them in practice. This is by far my
biggest worry. <br>
<br>
Now, I'm certainly biased, but I'd much rather see a scheme where a
quality of implementation issue effects the performance of C++.
These are far more likely to be actually fixed. :)<br>
<br>
Earlier in this thread, the idea of using metadata on loops was
mentioned. Chandler's point about generic handling for recursive
loops is a good one, but in general, a metadata annotation of
finiteness seems like a better starting point.<br>
<br>
What if we introduced a piece of branch (really, control transfer
instruction) metadata (not loop) called "productive" (to steal
Sanjoy's term) whose semantics are such that it can be assumed to
only execute a finite number of times between productive actions
(volatile, synchronization, io, etc..). We then tag *all* branches
emitted by Clang with this metadata. This gives us the benefit of
the loop metadata in that a single tagged backedge branch implies a
productive loop, but allows productive recursive functions to be
converted into productive loops in a natural way. <br>
<br>
The function attribute "productive" now has an easy inference rule
in that if all branches in the function are productive and all
callees are productive, so is the function. This would seem to
allow us to perform DSE, LICM, and related optimizations without
trouble. <br>
<br>
Inlining now has a reasonable definition where you can inline
between languages w/the semantics of each language preserved. <br>
<br>
One cool thing here is that the set of operations which are
"productive" could actually be encoded in the metadata. This could
potentially support other languages than C++ w.r.t. the "no infinite
loops except when" type rules. <br>
<br>
Thoughts?<br>
<br>
Philip<br>
<br>
p.s. The current implementation of readonly is correct w.r.t. C++
rules only because we designate all atomic, volatile, or fencing
operations as read/write. One thing we might end up with out of
this discussion is the possibility of talking about functions which
are readonly, but involved in synchronization. That seems like it
might be useful for optimizing concurrent C++ programs in
interesting ways. It would require a separate notion of a
synchronization side effect independent of read/write though. This
seems related, but slightly orthogonal to the assumptions about
finiteness of loops. <br>
</body>
</html>