[PATCH] D82085: [TRE] allow TRE for non-capturing calls.
Alexey Lapshin via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Wed Jun 24 08:04:22 PDT 2020
avl marked an inline comment as done.
avl added inline comments.
================
Comment at: llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp:825
+ // The local stack holds all alloca instructions and all byval arguments.
+ AllocaDerivedValueTracker Tracker;
+ for (Argument &Arg : F.args()) {
----------------
efriedma wrote:
> avl wrote:
> > efriedma wrote:
> > > avl wrote:
> > > > efriedma wrote:
> > > > > avl wrote:
> > > > > > efriedma wrote:
> > > > > > > avl wrote:
> > > > > > > > efriedma wrote:
> > > > > > > > > avl wrote:
> > > > > > > > > > efriedma wrote:
> > > > > > > > > > > avl wrote:
> > > > > > > > > > > > efriedma wrote:
> > > > > > > > > > > > > Do you have to redo the AllocaDerivedValueTracker analysis? Is it not enough that the call you're trying to TRE is marked "tail"?
> > > > > > > > > > > > >Do you have to redo the AllocaDerivedValueTracker analysis?
> > > > > > > > > > > >
> > > > > > > > > > > > AllocaDerivedValueTracker analysis(done in markTails) could be reused here.
> > > > > > > > > > > > But marking, done in markTails(), looks like separate tasks. i.e. it is better
> > > > > > > > > > > > to make TRE not depending on markTails(). There is a review for this - https://reviews.llvm.org/D60031
> > > > > > > > > > > > Thus such separation looks useful(To not reuse result of markTails but have it computed inplace).
> > > > > > > > > > > >
> > > > > > > > > > > > > Is it not enough that the call you're trying to TRE is marked "tail"?
> > > > > > > > > > > >
> > > > > > > > > > > > It is not enough that call which is subject to TRE is marked "Tail".
> > > > > > > > > > > > It also should be checked that other calls does not capture pointer to local stack:
> > > > > > > > > > > >
> > > > > > > > > > > > ```
> > > > > > > > > > > > // do not do TRE if any pointer to local stack has escaped.
> > > > > > > > > > > > if (!Tracker.EscapePoints.empty())
> > > > > > > > > > > > return false;
> > > > > > > > > > > >
> > > > > > > > > > > > ```
> > > > > > > > > > > >
> > > > > > > > > > > > It is not enough that call which is subject to TRE is marked "Tail". It also should be checked that other calls does not capture pointer to local stack:
> > > > > > > > > > >
> > > > > > > > > > > If there's an escaped pointer to the local stack, we wouldn't infer "tail" in the first place, would we?
> > > > > > > > > > If function receives pointer to alloca then it would not be marked with "Tail". Then we do not have a possibility to understand whether this function receives pointer to alloca but does not capture it:
> > > > > > > > > >
> > > > > > > > > > ```
> > > > > > > > > > void test(int recurseCount)
> > > > > > > > > > {
> > > > > > > > > > if (recurseCount == 0) return;
> > > > > > > > > > int temp = 10;
> > > > > > > > > > globalIncrement(&temp);
> > > > > > > > > > test(recurseCount - 1);
> > > > > > > > > > }
> > > > > > > > > > ```
> > > > > > > > > >
> > > > > > > > > > test - marked with Tail.
> > > > > > > > > > globalIncrement - not marked with Tail. But TRE could be done since it does not capture pointer. But if it will capture the pointer then we could not do TRE. So we need to check !Tracker.EscapePoints.empty().
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > test - marked with Tail.
> > > > > > > > >
> > > > > > > > > For the given code, TRE won't mark the recursive call "tail". That transform isn't legal: the recursive call could access the caller's version of "temp".
> > > > > > > > >For the given code, TRE won't mark the recursive call "tail". That transform isn't legal: the recursive call could access the caller's version of "temp".
> > > > > > > >
> > > > > > > > it looks like recursive call could NOT access the caller's version of "temp":
> > > > > > > >
> > > > > > > > ```
> > > > > > > > test(recurseCount - 1);
> > > > > > > > ```
> > > > > > > >
> > > > > > > > Caller`s version of temp is accessed by non-recursive call:
> > > > > > > >
> > > > > > > > ```
> > > > > > > > globalIncrement(&temp);
> > > > > > > > ```
> > > > > > > >
> > > > > > > > If globalIncrement does not capture the "&temp" then TRE looks to be legal for that case.
> > > > > > > >
> > > > > > > > globalIncrement() would not be marked with "Tail". test() would be marked with Tail.
> > > > > > > >
> > > > > > > > Thus the pre-requisite for TRE would be: tail-recursive call must not receive pointer to local stack(Tail) and non-recursive calls must not capture the pointer to local stack.
> > > > > > > Can you give a complete IR example where we infer "tail", but TRE is illegal?
> > > > > > >
> > > > > > > Can you give a complete IR example, we we don't infer "tail", but we still do the TRE transform here?
> > > > > > >Can you give a complete IR example where we infer "tail", but TRE is illegal?
> > > > > >
> > > > > > there is no such example. Currently all cases where we infer "tail" would be valid for TRE.
> > > > > >
> > > > > > >Can you give a complete IR example, we we don't infer "tail", but we still do the TRE transform here?
> > > > > >
> > > > > > For the following example current code base would not infer "tail" for _Z15globalIncrementPKi and as the result would not do TRE for _Z4testi.
> > > > > > This patch changes this behavior: so that if _Z15globalIncrementPKi is not marked with "tail" and does not capture its pointer argument - TRE would be allowed for _Z4testi.
> > > > > >
> > > > > >
> > > > > > ```
> > > > > > @count = dso_local local_unnamed_addr global i32 0, align 4
> > > > > >
> > > > > > ; Function Attrs: nofree noinline norecurse nounwind uwtable
> > > > > > define dso_local void @_Z15globalIncrementPKi(i32* nocapture readonly %param) local_unnamed_addr #0 {
> > > > > > entry:
> > > > > > %0 = load i32, i32* %param, align 4
> > > > > > %1 = load i32, i32* @count, align 4
> > > > > > %add = add nsw i32 %1, %0
> > > > > > store i32 %add, i32* @count, align 4
> > > > > > ret void
> > > > > > }
> > > > > >
> > > > > > ; Function Attrs: nounwind uwtable
> > > > > > define dso_local void @_Z4testi(i32 %recurseCount) local_unnamed_addr #1 {
> > > > > > entry:
> > > > > > %temp = alloca i32, align 4
> > > > > > %cmp = icmp eq i32 %recurseCount, 0
> > > > > > br i1 %cmp, label %return, label %if.end
> > > > > >
> > > > > > if.end: ; preds = %entry
> > > > > > %0 = bitcast i32* %temp to i8*
> > > > > > call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %0) #6
> > > > > > store i32 10, i32* %temp, align 4
> > > > > > call void @_Z15globalIncrementPKi(i32* nonnull %temp)
> > > > > > %sub = add nsw i32 %recurseCount, -1
> > > > > > call void @_Z4testi(i32 %sub)
> > > > > > call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %0) #6
> > > > > > br label %return
> > > > > >
> > > > > > return: ; preds = %entry, %if.end
> > > > > > ret void
> > > > > > }
> > > > > >
> > > > > > ; Function Attrs: argmemonly nounwind willreturn
> > > > > > declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #2
> > > > > >
> > > > > > ; Function Attrs: argmemonly nounwind willreturn
> > > > > > declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #2
> > > > > >
> > > > > > attributes #0 = { nofree noinline norecurse nounwind uwtable }
> > > > > > attributes #1 = { nounwind uwtable }
> > > > > > attributes #2 = { argmemonly nounwind willreturn }
> > > > > >
> > > > > > ```
> > > > > In your example, we don't infer "tail" for globalIncrement... but we do infer it for the recursive call to test(). I'm suggesting you could just check for "tail" on test(), instead of using AllocaDerivedValueTracker.
> > > > Checking only test() is not enough. There additionally should be checked globalIncrement().
> > > >
> > > > It is correct that while checking test() there could be checked "Tail" flag. Which does not require using AllocaDerivedValueTracker.
> > > >
> > > > But while checking globalIncrement() there should be checked whether some alloca value escaped or not. "Tail" flag could not be used for that. AllocaDerivedValueTracker allow to do such check:
> > > >
> > > > Tracker.EscapePoints.empty()
> > > >
> > > > If we would not do check for globalIncrement then it is not valid to do TRE. Thus it seems we need to check globalIncrement for escaping pointer and we need to use AllocaDerivedValueTracker for that.
> > > > Checking only test() is not enough. There additionally should be checked globalIncrement().
> > >
> > > >> Can you give a complete IR example where we infer "tail", but TRE is illegal?
> > > > there is no such example. Currently all cases where we infer "tail" would be valid for TRE.
> > >
> > > I'm not sure how to reconcile this. Are you saying we could infer "tail" in some case where TRE is illegal, but don't right now? Or are you saying that you plan to extend TRE to handle cases where we can't infer "tail" on the recursive call?
> > >Or are you saying that you plan to extend TRE to handle cases where we can't infer "tail" on the recursive call?
> >
> > I think not exactly. more precise would probably be : "I am saying that plan to extend TRE to handle cases where we can't infer "tail" on the NON-recursive NON-last call"
> >
> > globalIncrement() is non-recursive non-last call in above example. But we need to check whether it captures argument pointer or not to decide whether it is OK to do TRE.
> >
> > To make things clear - I am suggesting instead of current pre-requisite for TRE :
> >
> > "All call sites are marked with Tail"
> >
> > to make following:
> >
> > "Recursive last calls are marked with "Tail", non-recursive non-last calls are proved to not capture alloca".
> >
> > For the above example it means : the requirement for test() should stay the same(should be marked with Tail). The requirement for globalIncrement() should be "does not capture alloca".
> >
> "Recursive last calls are marked with 'tail'" implies "non-recursive non-last calls are proved to not capture alloca".
I see, thank you for explanations.
There is a test which makes me think that above rule is not always correct:
Transforms/TailCallElim/basic.ll:@test1
```
; PR615. Make sure that we do not move the alloca so that it interferes with the tail call.
define i32 @test1() {
; CHECK: i32 @test1()
; CHECK-NEXT: alloca
%A = alloca i32 ; <i32*> [#uses=2]
store i32 5, i32* %A
call void @use(i32* %A)
; CHECK: tail call i32 @test1
%X = tail call i32 @test1() ; <i32> [#uses=1]
ret i32 %X
}
```
I removed usages of AllocaDerivedValueTracker and corrected the test1 from Transforms/TailCallElim/basic.ll.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D82085/new/
https://reviews.llvm.org/D82085
More information about the cfe-commits
mailing list