[PATCH] D68633: fix debug info affects output when opt inline

Fri Oct 11 01:00:00 PDT 2019

bjope added a comment.

In D68633#1705291 <https://reviews.llvm.org/D68633#1705291>, @yechunliang wrote:

> In D68633#1699421 <https://reviews.llvm.org/D68633#1699421>, @bjope wrote:
>
> > The code is written in a way that it skips any instruction, but moves contigous blocks of allocas in one splice (not sure exactly why, is that really faster?).
>
>
> I also could not understand why continue to scan allocas block after first none use_empty alloca instruction, here is the first commit has some reason:  https://github.com/llvm/llvm-project/commit/6f8865bf9
>
> > Maybe the difference is that the check for AI->useEmpty() only is done for the first alloca in a sequence of alloca instructions? Or can't we just remove the loop at line 1847 (only moving one alloca at a time).
>
> with this example test case, second alloca is use_empty, and will insert to caller together with first alloca (!use_empty). But if there is dbg instruction between first alloca and second alloca instruction. the continue scan will break, 
>  then with the debug instruction, the program will goto the front for() loop, and handle the second alloca as use_empty (because it has no use list like "xxx.sroa_cast = bitcast %rec1198* %volatileloadslot to i8*") and  eraseFromParent.
>  this is difference as no-dbg inline will not erase second alloca instruction.

So the root cause is rather that we treat an alloca being immediately preceeded by another alloca differrently from the case when it is preceeded by another kind of instruction.  This happens also when having other instructions in between, and is not specific to dbg intrinsics (could be interesting to add a test case where you replace the dbg intrinsics by something else).

So I think that the solution might be based on one of these ideas:

1. Remove the check for use_empty in the outer loop.
2. Add a check for !use_empty in the inner loop.
3. Remove the inner loop (i.e only splice one alloca at a time).

Alternative 3 would be the simplest.

If there really is a speedup on doing fewer splices, then alternative 2 still moves consequtive !use_empty allocas in batches. The idea with alternative 2 is to split batches on allocas that has no uses (as they are handled in the outer loop).

Alternative 1 might work assuming that allocas with no uses are cleaned up somewhere else. But I think this alternative is the least interesting one.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68633/new/

https://reviews.llvm.org/D68633