<html><head><style type='text/css'>p { margin: 0; }</style></head><body><div style='font-family: arial,helvetica,sans-serif; font-size: 10pt; color: #000000'><br><hr id="zwchr"><blockquote id="DWT9585" style="border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; padding-left: 5px; color: rgb(0, 0, 0); font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><b>From: </b>"Philip Reames" <listmail@philipreames.com><br><b>To: </b>"Chandler Carruth" <chandlerc@google.com>, "Hal Finkel" <hfinkel@anl.gov><br><b>Cc: </b>"LLVM Dev" <llvmdev@cs.uiuc.edu><br><b>Sent: </b>Friday, July 17, 2015 7:06:33 PM<br><b>Subject: </b>Re: [LLVMdev] [RFC] Defining Infinite Loops<br><br>

    <br>

    <br>

    <div class="moz-cite-prefix">On 07/17/2015 02:03 AM, Chandler

      Carruth wrote:<br>

    </div>

    <blockquote cite="mid:CAGCO0Kj9_ufibPrOetLr6-5ev0_DAFTnK8uA+6tmadx3y9qp6Q@mail.gmail.com">

      <div dir="ltr">

        <div class="gmail_quote">

          <div dir="ltr">On Thu, Jul 16, 2015 at 11:08 AM Hal Finkel

            <<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>>

            wrote:<br>

          </div>

          <blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">-----

            Original Message -----<br>

            > From: "Chandler Carruth" <<a href="mailto:chandlerc@google.com" target="_blank">chandlerc@google.com</a>><br>

            > To: "Hal Finkel" <<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>><br>

            > Cc: "LLVM Dev" <<a href="mailto:llvmdev@cs.uiuc.edu" target="_blank">llvmdev@cs.uiuc.edu</a>><br>

            > Sent: Thursday, July 16, 2015 2:33:21 AM<br>

            > Subject: Re: [LLVMdev] [RFC] Defining Infinite Loops<br>

            ><br>

            ><br>

            ><br>

            ><br>

            > On Thu, Jul 16, 2015 at 12:27 AM Hal Finkel < <a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a> ><br>

            > wrote:<br>

            ><br>

            ><br>

            > <hr id="zwchr"><br>

            > > From: "Chandler Carruth" < <a href="mailto:chandlerc@google.com" target="_blank">chandlerc@google.com</a> ><br>

            > > To: "Hal Finkel" < <a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>

            >, "LLVM Dev" <<br>

            > > <a href="mailto:llvmdev@cs.uiuc.edu" target="_blank">llvmdev@cs.uiuc.edu</a>

            ><br>

            > > Sent: Thursday, July 16, 2015 1:00:05 AM<br>

            > > Subject: Re: [LLVMdev] [RFC] Defining Infinite

            Loops<br>

            > ><br>

            > ><br>

            > > FWIW, I'm very much in favor of having a firm and

            clear answer to<br>

            > > these questions.<br>

            > ><br>

            > > I also agree that it is an absolute requirement

            that LLVM have<br>

            > > *some*<br>

            > > mechanism for supporting both languages with

            defined behavior for<br>

            > > infinite loops and a language requirement that all

            loops terminate.<br>

            > ><br>

            > ><br>

            > > However, I'd like to float an alternative

            approach. I've not spent<br>

            > > a<br>

            > > lot of time thinking about it, so I'm not sure its

            actually better.<br>

            > > I'm wondering if you've already thought about it.<br>

            > ><br>

            > ><br>

            > > What if we have an @llvm.noop.sideeffect() or some

            such which<br>

            > > doesn't<br>

            > > read or write memory in any way, but which a

            frontend can place<br>

            > > inside a loop body to mark that its execution

            (perhaps infinitely)<br>

            > > is in-and-of-itself a side effect of the program.

            We could then<br>

            > > teach loop unrolling or the few other things that

            would care to<br>

            > > collapse multiple ones into a single one, and not

            count them<br>

            > > towards<br>

            > > anything.<br>

            > ><br>

            > ><br>

            > > I know an intrinsic is kind of awkward for this,

            but it seems like<br>

            > > the least bad way we have to annotate a loop in a

            fully generic<br>

            > > way.<br>

            > > I'd somewhat like this to be a property of the

            *loop* and not of<br>

            > > the<br>

            > > function. And it really needs to be truly generic,

            handling<br>

            > > unnatural CFGs and even recursion-loops. My only

            idea for how to<br>

            > > accomplish that is an intrinsic to mark the

            dynamic path which if<br>

            > > executed infinitely can't be eliminated.<br>

            ><br>

            > My largest concern is that the frontend would need to

            add these<br>

            > things all over the place, not just before the loop

            backedges. For<br>

            > one thing, if the language has gotos, where should they

            be inserted?<br>

            ><br>

            ><br>

            > The target of every goto.<br>

            ><br>

            ><br>

            > For computed goto, very label whose address is taken.<br>

            ><br>

            ><br>

            > This at least doesn't seem that bad to me.<br>

            ><br>

            ><br>

            > Before every branch will be conservatively correct, but

            I'm worried<br>

            > that will unnecessarily block optimizations. They'd

            also be needed<br>

            > at the entry to every function.<br>

            ><br>

            ><br>

            > Only external, address taken, or

            internal-and-recursively-called<br>

            > functions. All of which we already have some blockers

            to<br>

            > optimization, so this part i'm not worried about.<br>

            ><br>

            ><br>

            > On the other hand, maybe if we added an optimization

            that removed<br>

            > these things along any control-flow path that already

            had any other<br>

            > side effect, it might not be too bad?<br>

            ><br>

            ><br>

            ><br>

            > Absolutely, much like lower-expect, we'd need to make

            sure that easy<br>

            > cases were folded quickly in the optimizer so this

            didn't get out of<br>

            > hand.<br>

            ><br>

            ><br>

            ><br>

            > ><br>

            > ><br>

            > > As for why I'm particularly interested in this

            being a property of<br>

            > > the loop, consider if you were to have a mixture

            of Java and C++<br>

            > > code, all compiled to LLVM. How do you inline

            between them?<br>

            > ><br>

            ><br>

            > You add the attribute to the caller.<br>

            ><br>

            ><br>

            > This has the really unfortunate side effect of

            pessimizing code<br>

            > during cross language optimizations.<br>

            ><br>

            ><br>

            > FWIW, I suspect I might care a lot about this

            particular case<br>

            > (because I believe that Fortran has defined behavior

            for infinite<br>

            > loops).<br>

            ><br>

            ><br>

            ><br>

            > Yea, you could argue that C does too, which is one

            reason why I'm so<br>

            > interested in this being done really well even in an

            LTO situation.<br>

            ><br>

            ><br>

            > I think it would be really useful to not have this

            cross between<br>

            > adjacent loops after inlining when they come from

            different source<br>

            > languages, and it would be nice for it to not apply to

            nested loops<br>

            > when those nested loops were inlined from a language

            without this<br>

            > guarantee.<br>

            ><br>

            ><br>

            > But I'm still not convinced that the noise of the

            intrinsic is<br>

            > *definitely* worth it. I come from the background of

            the C++ rules'<br>

            > rationale, and so I naturally see the languages that

            define this as<br>

            > giving up optimizations and so wanting to localize the

            impact of<br>

            > that... Not sure that's actually the right perspective

            though. ;]<br>

            ><br>

            <br>

            I'm leaning toward agreeing with you, primarily because I

            think it will more-naturally fit into the optimizer than the

            attribute. We need to check loops for side effects anyway

            (if we otherwise default to C++-like rules), and so this

            intrinsic will do the right thing without any special logic.<br>

          </blockquote>

          <div><br>

          </div>

          <div>FWIW, this is why I first started thinking along these

            lines. I'm increasingly thinking that if this approach works

            it will make the implementation of testing for this more

            natural in the optimizer. Simple things like instruction

            predicates will "just work", etc.</div>

          <div><br>

          </div>

          <div>I'm really interested in the downsides. You mentioned a

            few potential ones, but seem to be happy with my responses.

            I wonder, are there others?</div>

        </div>

      </div>

    </blockquote>

    I'm really not a fan of this approach.  I think it could be made to

    work, but I suspect we'd run into a lot of quality of implementation

    issues if we went down this route.  <br>

    <br>

    We'd have to teach many places in the optimizer to merge or split

    such calls.  Consider simple things like tail commoning,

    if-then-else hoisting, or the like.  In each of these, we'd need

    code to recognize having the intrinsic on one path, but not the

    other, and then still perform the optimization.  Think the various

    debug intrinsics, but worse.  <br>

    <br>

    I worry about the interactions with memory aliasing and hoisting

    rules.  We don't currently have the notion of a non-memory side

    effect in LLVM.  To prevent this intrinsic from being DCE'd we'd

    likely need to mark it as writing to some known location. </blockquote><br>We'd do the same thing we did with @llvm.assume, for which we have all of the same problems (including those mentioned above, only worse). For @llvm.assume we tag it as writing to an unknown location, but taught BasicAA that is does not alias with any particular location.<br><br><blockquote id="DWT9604" style="border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; padding-left: 5px; color: rgb(0, 0, 0); font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"> Given the

    number of places in the optimizer which give up when encountering

    any write (EarlyCSE for one), that would be problematic for

    optimization effectiveness.  The other approach would be to teach

    LLVM about non-memory side effects.  I think this is solvable, but a

    large investment. <br></blockquote>EarlyCSE has specific code to deal with @llvm.assume as well (four lines of code, but code nevertheless).<br><blockquote id="DWT9679" style="border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; padding-left: 5px; color: rgb(0, 0, 0); font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;">

    <br>

    In practice, most of the contributors to LLVM care about C++.  I

    worry we'd end up in a situation where languages which need infinite

    loops would become second class citizens and that a large number of

    optimizations wouldn't apply to them in practice.  This is by far my

    biggest worry.  <br>

    <br>

    Now, I'm certainly biased, but I'd much rather see a scheme where a

    quality of implementation issue effects the performance of C++. 

    These are far more likely to be actually fixed.  :)<br>

    <br>

    Earlier in this thread, the idea of using metadata on loops was

    mentioned.  Chandler's point about generic handling for recursive

    loops is a good one, but in general, a metadata annotation of

    finiteness seems like a better starting point.<br>

    <br>

    What if we introduced a piece of branch (really, control transfer

    instruction) metadata (not loop) called "productive" (to steal

    Sanjoy's term) whose semantics are such that it can be assumed to

    only execute a finite number of times between productive actions

    (volatile, synchronization, io, etc..).  We then tag *all* branches

    emitted by Clang with this metadata. This gives us the benefit of

    the loop metadata in that a single tagged backedge branch implies a

    productive loop, but allows productive recursive functions to be

    converted into productive loops in a natural way.  <br></blockquote><br>I don't see how this helps us with the recursive functions case. If I have this:<br><br>void foo() {<br>  bar();<br>  foo();<br>}<br><br>where do I put the metadata? Maybe we could also tag the 'ret' as well?<br><br><blockquote id="DWT9680" style="border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; padding-left: 5px; color: rgb(0, 0, 0); font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;">

    <br>

    The function attribute "productive" now has an easy inference rule

    in that if all branches in the function are productive and all

    callees are productive, so is the function.  This would seem to

    allow us to perform DSE, LICM, and related optimizations without

    trouble.  <br>

    <br>

    Inlining now has a reasonable definition where you can inline

    between languages w/the semantics of each language preserved.  <br>

    <br>

    One cool thing here is that the set of operations which are

    "productive" could actually be encoded in the metadata.  This could

    potentially support other languages than C++ w.r.t. the "no infinite

    loops except when" type rules.  <br>

    <br>

    Thoughts?<br></blockquote><br>I was at first afraid of the number of places that create branch instructions, and updating them all to preserve the metadata when appropriate. However:<br><br>$ grep -r 'BranchInst::Create\|CreateBr' lib/Transforms | wc -l<br>99<br>$ grep -r 'BranchInst::Create\|CreateBr' lib/Transforms -l | wc -l<br>33<br><br>And so there are only 99 places in 33 files that create branches in lib/Transforms (and I find the numerical coincidence amusing). So this seems potentially doable.<br><br>Also, if I'm right and we'd need them on 'ret' as well:<br><br>$ grep -r 'ReturnInst::Create\|CreateRet' lib/Transforms | wc -l<br>33<br>$ grep -r 'ReturnInst::Create\|CreateRet' lib/Transforms -l | wc -l<br>11<br><br>And now I'm even more amused.<br><br>Chandler, what do you think?<br><br> -Hal<br><br><blockquote style="border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; padding-left: 5px; color: rgb(0, 0, 0); font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;">

    <br>

    Philip<br>

    <br>

    p.s. The current implementation of readonly is correct w.r.t. C++

    rules only because we designate all atomic, volatile, or fencing

    operations as read/write.  One thing we might end up with out of

    this discussion is the possibility of talking about functions which

    are readonly, but involved in synchronization.  That seems like it

    might be useful for optimizing concurrent C++ programs in

    interesting ways.  It would require a separate notion of a

    synchronization side effect independent of read/write though.  This

    seems related, but slightly orthogonal to the assumptions about

    finiteness of loops.    <br>

  </blockquote>I agree; we do need some notion of side effects other than based on memory dependencies. This, however, is a (hopefully) separate project.<br><br>Thanks again,<br>Hal<br><br><br>-- <br><div><span name="x"></span>Hal Finkel<br>Assistant Computational Scientist<br>Leadership Computing Facility<br>Argonne National Laboratory<span name="x"></span><br></div></div></body></html>