[PATCH] D85614: [TRE] Reland: allow TRE for non-capturing calls.

Alexey Lapshin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Oct 7 11:57:29 PDT 2020


avl added inline comments.


================
Comment at: llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp:600
+                       CI->getArgOperand(OpndIdx),
+                       /*SrcAlign*/ Alignment, Size);
+
----------------
efriedma wrote:
> This is the right direction, but I'm not sure it does the right thing in general.
> 
> In particular, it's possible that `CI->getArgOperand(OpndIdx)` points to one of the allocas created by createTempForByValOperand(); if it does, then this copy might be clobbering data you need for subsequent copies.  Again, consider the `void dostuff(A arg, Arg b) { dostuff(b, a); }` case.
> 
> This is why the general procedure I outlined has an extra step: we copy the temporary allocas to the original byval arguments, and use the byval arguments as the operands to the PHI.  That way, `CI->getArgOperand(OpndIdx)` never points to one of the allocas allocated by createTempForByValOperand.
> This is the right direction, but I'm not sure it does the right thing in general.
> 

It looks like "void dostuff(A arg, Arg b) { dostuff(b, a); }" test case is working correctly:

$ cat test.cpp
#include <stdio.h>
typedef struct A { long long x[10] = {0}; } A;
A global;
void dostuff(A a, A b, int i) { if (i==10) return; a.x[5]++; printf("%lld %lld\n", a.x[5], b.x[5]); dostuff(b, a, i+1); }
__attribute((optnone)) int main() { dostuff(global, global, 0); }
$ clang++ test.cpp -O3 
$ ./a.out
1 0
1 1
2 1
2 2
3 2
3 3
4 3
4 4
5 4
5 5

> In particular, it's possible that CI->getArgOperand(OpndIdx) points to one of the allocas created by createTempForByValOperand(); if it does, then this copy might be clobbering data you need for subsequent copies. Again, consider the void dostuff(A arg, Arg b) { dostuff(b, a); } case.
> 
> This is why the general procedure I outlined has an extra step: we copy the temporary allocas to the original byval arguments, and use the byval arguments as the operands to the PHI. That way, CI->getArgOperand(OpndIdx) never points to one of the allocas allocated by createTempForByValOperand.

If I am not mistaken the CI->getArgOperand(OpndIdx) could not point to one of the allocas created by createTempForByValOperand(). There is not setArgOperand() which set operand to newly created temps by createTempForByValOperand(). Thus CI->getArgOperand(OpndIdx) always point to the alloca created for recursive call to dostuff(). 

For the ""void dostuff(A arg, Arg b) { dostuff(b, a); }" test case:


before tre:

```

  %2 = bitcast %struct.A* %agg.tmp to i8*
  %3 = bitcast %struct.A* %b to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(80) %2, i8* nonnull align 8 dereferenceable(80) %3, i64 80, i1 false), !tbaa.struct !6
  ^^^^^^^^^^^^^^^^ copy to %agg.tmp which is first byval argument
  %4 = bitcast %struct.A* %agg.tmp5 to i8*
  %5 = bitcast %struct.A* %a to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(80) %4, i8* nonnull align 8 dereferenceable(80) %5, i64 80, i1 false), !tbaa.struct !6
  ^^^^^^^^^^^^^^^^ copy to %agg.tmp5 which is second byval argument
  %add = add nsw i32 %i, 1
  call void @_Z7dostuff1AS_i(%struct.A* nonnull byval(%struct.A) align 8 %agg.tmp, %struct.A* nonnull byval(%struct.A) align 8 %agg.tmp5, i32 %add)
```

after tre:

```

tailrecurse:                                      ; preds = %if.end, %entry
  %a.tr = phi %struct.A* [ %a, %entry ], [ %agg.tmp7, %if.end ]
  %b.tr = phi %struct.A* [ %b, %entry ], [ %agg.tmp58, %if.end ]

if.end:
...
  %2 = bitcast %struct.A* %agg.tmp to i8*
  %3 = bitcast %struct.A* %b.tr to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(80) %2, i8* nonnull align 8 dereferenceable(80) %3, i64 80, i1 false), !tbaa.struct !6
  ^^^^^^^^^^^^^^^^ copy to %agg.tmp which is first byval argument
  %4 = bitcast %struct.A* %agg.tmp5 to i8*
  %5 = bitcast %struct.A* %a.tr to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(80) %4, i8* nonnull align 8 dereferenceable(80) %5, i64 80, i1 false), !tbaa.struct !6
  ^^^^^^^^^^^^^^^^ copy to %agg.tmp5 which is second byval argument
  %add = add nsw i32 %i.tr, 1
  %6 = bitcast %struct.A* %agg.tmp7 to i8*
  %7 = bitcast %struct.A* %agg.tmp to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %6, i8* align 8 %7, i64 80, i1 false)
  ^^^^^^^^^^^^^^^^ copy to the agg.tmp7 which is used as input for next iteration
  %8 = bitcast %struct.A* %agg.tmp58 to i8*
  %9 = bitcast %struct.A* %agg.tmp5 to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %8, i8* align 8 %9, i64 80, i1 false)
  ^^^^^^^^^^^^^^^^ copy to the agg.tmp58 which is used as input for next iteration
  br label %tailrecurse

```

Probably the only difference with your description is that this patch creates additional temporarily variables(agg.tmp7/agg.tmp58) while it could use locations of a/b parameters instead of newly created temps. 






Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85614/new/

https://reviews.llvm.org/D85614



More information about the llvm-commits mailing list