[PATCH] D60074: [Attributor] Deduce "no-recurse" function attribute

Thu Jun 6 15:40:02 PDT 2019

jdoerfert added a comment.

Thanks for the extensive comment!

In D60074#1532046 <https://reviews.llvm.org/D60074#1532046>, @nicholas wrote:

> In D60074#1531819 <https://reviews.llvm.org/D60074#1531819>, @jdoerfert wrote:
>
> > In D60074#1531809 <https://reviews.llvm.org/D60074#1531809>, @efriedma wrote:
> >
> > > Is it actually correct to derive norecurse like this?
> > >
> > > Suppose I have two functions:
> > >
> > >   void f(int x) { if (x) { g(); } }
> > >   void g() { f(0); }
> > >
> > >
> > > As far as I can tell, it would be correct to mark g() norecurse.  And if g() is marked norecurse, we'll mark f() norecurse, which is wrong.
> >
> >
> > We should/do not mark `g` as `norecurse` in this example. At least not if `f` is externally visible.
>
>
> Could you explain a little further? I don't see how changing whether `f` is externally visible changes whether it's OK to mark `g` with `norecurse`. Here's my thinking. If `f` is externally visible that means it can be called from external code, and we have to make the worst-case assumption about the call stack, but in this case we know that `g` can't be on the call stack since `g` doesn't call any external functions. Is that right?

I should have explained this better from the beginning. At the moment, for better or worse, we derive `norecurse` only if the function does not call a `recursive` function. I did not want to change this behavior in this patch but I'm willing to investigate if we should. Now, in the example above, `f` is recursive as it calls `g` which calls `f`. And `g` is recursive as it calls `f` which was recursive. My internal comment was following this logic: If `f` was internal, we could (conceptually) eliminate the call of `g` after IPSCCP and thereby making both functions non-recursive.

> I think Eli's question can be considered in two parts. The first is about what LangRef permits, not what the Attributor pass will deduce. Is it legal to mark `g` with `norecurse`? If it is, some other user of LLVM, perhaps a user of LLVM as a library, may mark it `norecurse` themselves which would be correct/truthful/not a bug yet.

Regarding the example, the way I read the LangRef `g` could be `norecurse` but `f` not. Now I'm happy to make that deduction (in a subsequent patch?) but I'll have to check if it breaks existing assumptions.

> The second part of Eli's question, is that if the first part is right and `g` has been annotated with `norecurse` before the Attributor runs, won't it mark `f` as `norecurse`, and that would be an incorrect deduction on the Attributor's part?

As I mentioned above, the Attributor preserves the current behavior (minus the bug we have) and it would not annotate either function as `norecurse`.

>>> Granted, this is being sort of pedantic with the definition in LangRef, which says the function "does not call itself".  It might make sense to redefine norecurse to mean the function doesn't call "any function currently on the stack", instead of just itself.
> 
> The LangRef says "down any possible call path" which might be an attempt to make `g` not legal to be marked with `norecurse`. I think the wording is unclear, is it a "possible path" to have `g` -> `f` -> `g` or not? Statically both `g` -> `f` and `f` -> `g` calls are present, but we can also statically prove that the `g` -> `f` -> `g` route combination is impossible. I think LangRef's wording should focus on forbidding the function from occurring more than once on the call stack, and remove the wording about possible paths. The net result of that change would permit marking of `g` as `norecurse`.
> 
> I'm not sure the proposed "does not call any function currently on the stack" is right either. Consider a call stack of A -> B -> C -> B. If this is the call stack that is generated every time (there is no other caller of B or C and this happens for all calls to A) it would be true that A and C are `norecurse` and B is not. With the proposed definition, `c` could not be marked `norecurse`. I haven't considered whether it's useful for optimization to think of C as being `norecurse`.

I think the LangRef allows `g` to be `norecurse` right now.

> If the above scenario were real, it would require that B has a branch that determines whether a call is made, splitting the function into a part that may recurse and a part that may not. Some day, we may want to pursue `norecurse` at the instruction level with the semantic of "after this instruction is executed, it is not executed again by any direct or indirect callee until the return of the instruction parent's function on the call stack" (awkward wording so as to not forbid loops within the function). Offhand I think this would also work to represent the result of inlining a `norecurse` callee into a non-norecurse caller. (Again, I haven't considered whether this information is useful for any optimization.)

This is something we might want to discuss separately ;). Though, I'd be happy to look into it if we have a use case :)

>> What we did before and with this patch is derive `norecurse` only if no call to a recursive function is contained. So in contrast to other attributes we actually do not look at the optimistic/assumed state of called functions.
> 
> I think the right fix is to extend it to derive `norecurse` only if no call to a recursive function or a function in the same SCC (even if marked `norecurse`) is contained. That would allow the attributor to avoid marking `f` as `norecurse` even if `g` were marked `norecurse`.

If we derive what the LangRef says right now, the best solution is `norecurse` for `g` and nothing for `f`. I don't think we need to change the wording. (Maybe I'm missing something here.)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60074/new/

https://reviews.llvm.org/D60074