[PATCH] D85614: [TRE] Reland: allow TRE for non-capturing calls.
Alexey Lapshin via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Oct 7 11:57:29 PDT 2020
avl added inline comments.
================
Comment at: llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp:600
+ CI->getArgOperand(OpndIdx),
+ /*SrcAlign*/ Alignment, Size);
+
----------------
efriedma wrote:
> This is the right direction, but I'm not sure it does the right thing in general.
>
> In particular, it's possible that `CI->getArgOperand(OpndIdx)` points to one of the allocas created by createTempForByValOperand(); if it does, then this copy might be clobbering data you need for subsequent copies. Again, consider the `void dostuff(A arg, Arg b) { dostuff(b, a); }` case.
>
> This is why the general procedure I outlined has an extra step: we copy the temporary allocas to the original byval arguments, and use the byval arguments as the operands to the PHI. That way, `CI->getArgOperand(OpndIdx)` never points to one of the allocas allocated by createTempForByValOperand.
> This is the right direction, but I'm not sure it does the right thing in general.
>
It looks like "void dostuff(A arg, Arg b) { dostuff(b, a); }" test case is working correctly:
$ cat test.cpp
#include <stdio.h>
typedef struct A { long long x[10] = {0}; } A;
A global;
void dostuff(A a, A b, int i) { if (i==10) return; a.x[5]++; printf("%lld %lld\n", a.x[5], b.x[5]); dostuff(b, a, i+1); }
__attribute((optnone)) int main() { dostuff(global, global, 0); }
$ clang++ test.cpp -O3
$ ./a.out
1 0
1 1
2 1
2 2
3 2
3 3
4 3
4 4
5 4
5 5
> In particular, it's possible that CI->getArgOperand(OpndIdx) points to one of the allocas created by createTempForByValOperand(); if it does, then this copy might be clobbering data you need for subsequent copies. Again, consider the void dostuff(A arg, Arg b) { dostuff(b, a); } case.
>
> This is why the general procedure I outlined has an extra step: we copy the temporary allocas to the original byval arguments, and use the byval arguments as the operands to the PHI. That way, CI->getArgOperand(OpndIdx) never points to one of the allocas allocated by createTempForByValOperand.
If I am not mistaken the CI->getArgOperand(OpndIdx) could not point to one of the allocas created by createTempForByValOperand(). There is not setArgOperand() which set operand to newly created temps by createTempForByValOperand(). Thus CI->getArgOperand(OpndIdx) always point to the alloca created for recursive call to dostuff().
For the ""void dostuff(A arg, Arg b) { dostuff(b, a); }" test case:
before tre:
```
%2 = bitcast %struct.A* %agg.tmp to i8*
%3 = bitcast %struct.A* %b to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(80) %2, i8* nonnull align 8 dereferenceable(80) %3, i64 80, i1 false), !tbaa.struct !6
^^^^^^^^^^^^^^^^ copy to %agg.tmp which is first byval argument
%4 = bitcast %struct.A* %agg.tmp5 to i8*
%5 = bitcast %struct.A* %a to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(80) %4, i8* nonnull align 8 dereferenceable(80) %5, i64 80, i1 false), !tbaa.struct !6
^^^^^^^^^^^^^^^^ copy to %agg.tmp5 which is second byval argument
%add = add nsw i32 %i, 1
call void @_Z7dostuff1AS_i(%struct.A* nonnull byval(%struct.A) align 8 %agg.tmp, %struct.A* nonnull byval(%struct.A) align 8 %agg.tmp5, i32 %add)
```
after tre:
```
tailrecurse: ; preds = %if.end, %entry
%a.tr = phi %struct.A* [ %a, %entry ], [ %agg.tmp7, %if.end ]
%b.tr = phi %struct.A* [ %b, %entry ], [ %agg.tmp58, %if.end ]
if.end:
...
%2 = bitcast %struct.A* %agg.tmp to i8*
%3 = bitcast %struct.A* %b.tr to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(80) %2, i8* nonnull align 8 dereferenceable(80) %3, i64 80, i1 false), !tbaa.struct !6
^^^^^^^^^^^^^^^^ copy to %agg.tmp which is first byval argument
%4 = bitcast %struct.A* %agg.tmp5 to i8*
%5 = bitcast %struct.A* %a.tr to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(80) %4, i8* nonnull align 8 dereferenceable(80) %5, i64 80, i1 false), !tbaa.struct !6
^^^^^^^^^^^^^^^^ copy to %agg.tmp5 which is second byval argument
%add = add nsw i32 %i.tr, 1
%6 = bitcast %struct.A* %agg.tmp7 to i8*
%7 = bitcast %struct.A* %agg.tmp to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %6, i8* align 8 %7, i64 80, i1 false)
^^^^^^^^^^^^^^^^ copy to the agg.tmp7 which is used as input for next iteration
%8 = bitcast %struct.A* %agg.tmp58 to i8*
%9 = bitcast %struct.A* %agg.tmp5 to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %8, i8* align 8 %9, i64 80, i1 false)
^^^^^^^^^^^^^^^^ copy to the agg.tmp58 which is used as input for next iteration
br label %tailrecurse
```
Probably the only difference with your description is that this patch creates additional temporarily variables(agg.tmp7/agg.tmp58) while it could use locations of a/b parameters instead of newly created temps.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D85614/new/
https://reviews.llvm.org/D85614
More information about the llvm-commits
mailing list