[PATCH] D64595: [Debuginfo][SROA] Need to handle dbg.value in SROA pass.

Tue Aug 20 13:57:16 PDT 2019

avl marked an inline comment as done.
avl added inline comments.

================
Comment at: llvm/lib/Transforms/InstCombine/InstructionCombining.cpp:3163-3164
     if (DII->getParent() == SrcBlock) {
-      // dbg.value is in the same basic block as the sunk inst, see if we can
-      // salvage it. Clone a new copy of the instruction: on success we need
-      // both salvaged and unsalvaged copies.
-      SmallVector<DbgVariableIntrinsic *, 1> TmpUser{
-          cast<DbgVariableIntrinsic>(DII->clone())};
-
-      if (!salvageDebugInfoForDbgValues(*I, TmpUser)) {
-        // We are unable to salvage: sink the cloned dbg.value, and mark the
-        // original as undef, terminating any earlier variable location.
-        LLVM_DEBUG(dbgs() << "SINK: " << *DII << '\n');
-        TmpUser[0]->insertBefore(&*InsertPos);
-        Value *Undef = UndefValue::get(I->getType());
-        DII->setOperand(0, MetadataAsValue::get(DII->getContext(),
-                                                ValueAsMetadata::get(Undef)));
+      if (DII->getIntrinsicID() == Intrinsic::dbg_declare &&
+          !isa<AllocaInst>(I)) {
+        // There should be only one dbg.declare instruction. Thus we could not
----------------
aprantl wrote:
> avl wrote:
> > aprantl wrote:
> > > avl wrote:
> > > > rnk wrote:
> > > > > aprantl wrote:
> > > > > > jmorse wrote:
> > > > > > > Interesting -- are there circumstances where this happens (dbg.declare of something that isn't an alloca)? I suspect it would be IR that didn't make sense, we might be better off disallowing it in the IR Verifier instead.
> > > > > > I must admit I'm confused how this patch relates to the example in the review description. What is the instruction coming in here in the example?
> > > > > clang emits dbg.declare of things that are not allocas from time to time. Usually it is for C++ classes that, when passed by value at the source level, are instead passed indirectly by pointer as required by the ABI. Or, for POD structs passed using byval.
> > > > > 
> > > > > Also, after inlining, these arguments may resolve to pointers to non-alloca locations, as in `this->field = getSRetThing();`.
> > > > > 
> > > > That happens with this patch and llvm/test/DebugInfo/X86/sroa-after-inlining2.ll testcase if InstructionCombining.cpp:3163-3173 piece deleted.
> > > > 
> > > > opt sroa-after-inlining2.ll -sroa -instcombine -inline -instcombine -sroa -S -o -
> > > > 
> > > > before second instcombine pass there is following IR : 
> > > > 
> > > > 
> > > > ```
> > > > entry:
> > > >   %result = alloca i64, align 8
> > > >   %tmpcast = bitcast i64* %result to %struct.S1*     <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> > > >   %0 = bitcast i64* %result to i8*, !dbg !24
> > > >   call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %0), !dbg !24
> > > >   call void @llvm.dbg.declare(metadata %struct.S1* %tmpcast, metadata !12, metadata !DIExpression()), !dbg !25   <<<<<<<<<<
> > > > ```
> > > > 
> > > > after second instcombine pass it transformed into : 
> > > > 
> > > > 
> > > > ```
> > > > entry:
> > > >   %result = alloca i64, align 8
> > > >   %0 = bitcast i64* %result to i8*, !dbg !24
> > > >   call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %0), !dbg !24
> > > >   call void @llvm.dbg.declare(metadata i64* %result, metadata !12, metadata !DIExpression()), !dbg !25
> > > > 
> > > > 
> > > > if.end:                                           ; preds = %entry
> > > >   %tmpcast = bitcast i64* %result to %struct.S1*   <<<<<<<<<<<<<<<<<<<<<<<<<
> > > >   call void @llvm.dbg.declare(metadata %struct.S1* %tmpcast, metadata !12, metadata !DIExpression()), !dbg !25   <<<<<<<<<<
> > > > ```
> > > > 
> > > > instcombine moves bitcast instruction and clones dbg.declare.
> > > (Looks like the `!isa<AllocaInst>(I)` allows you to detect the bitcast-of-alloc case.)
> > > 
> > > I'm not quite convinced by this example, but maybe there is something I'm missing, let me know!
> > > 
> > > The semantics of dbg.declare(alloca, var) are that the alloca is the definitive storage for var. There is no need to insert duplicate dbg.declares later in the code because a dbg.declare is valid throughout the function until the lifetime of alloca ends or var goes out of scope.
> > > If we need more fine-grained tracking of locations we have dbg.value and dbg.addr at our disposal, which are valid until the next dbg.value/addr for the same (overlapping fragment of the) variable. LowerDbgDeclare is the pass that switches between these two modes by deleting the dbg.declare and inserting a dbg.value at every load from the alloca.
> > > 
> > > I suspect that the disconnect between my mental model and what this patch is about comes from that LowerDbgDeclare is good at describing the SSA values loaded from memory, but does a terrible job describing the values stored into alloca, as it is neither inserting a dbg.addr for store instuctions describing the alloca, nor are we inserting dbg.addr when the address is passed to function that takes a pointer aliasing with the variable.
> > > 
> > > To reiterate: I think it is a bug to have more than one dbg.declare describing the same variable fragment in the same alloca (see SourceLevelDebugging.rst, llvm.dbg.declare). If we need to duplicate something pointing to an alloca, it should be a dbg.value or dbg.addr. (Similarly, I also think it is a bug to have a dbg.declare and a dbg.value describing the same variable.)
> > > 
> > > For the quoted example, a single dbg.declare would be sufficient, assuming there are no other dbg.values for this variable. If there are dbg.values present outside of the quoted region, the dbg.declares should be dbg.value(%result, !DIExpression(DW_OP_deref)) instead.
> > @aprantl Hi, I am sorry, quite a lot of text later. There are two problems here: 
> > 
> > 1. Original problem with incorrect dbg.value if SROA optimization called twice.
> > 
> > 2. Problem with cloned dbg.declare because of additional bitcast instruction. 
> >    That problem becomes visible if lowering for structures avoided.
> > 
> > //=============================================================================
> > For the point 1:
> > 
> > SROA and mem2reg conversion do not work with dbg.value. They work with only dbg.declare. 
> > As described in your model: LowerDbgDeclare is the pass that switches from dbg.declare into dbg.value mode. 
> > But SROA and mem2reg are not ready for that mode. This leads to incorrect(not-patched) dbg.value after second SROA.
> > 
> > IR before first instcombine for non-patched version : 
> > 
> > opt llvm/test/DebugInfo/X86/sroa-after-inlining2.ll -sroa -S -o -
> > 
> > 
> > ```
> > entry:
> >   %result = alloca %struct.S1, align 4
> >   %0 = bitcast %struct.S1* %result to i8*, !dbg !24
> >   call void @llvm.lifetime.start.p0i8(i64 8, i8* %0), !dbg !24
> >   call void @llvm.dbg.declare(metadata %struct.S1* %result, metadata !12, metadata !DIExpression()), !dbg !25    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> >   %call = call i64 @_Z3foov(), !dbg !26
> >   %1 = bitcast %struct.S1* %result to i64*, !dbg !26
> >   store i64 %call, i64* %1, align 4, !dbg !26
> >   %call1 = call zeroext i1 @_ZN2S16IsNullEv(%struct.S1* %result), !dbg !27
> >   br i1 %call1, label %if.then, label %if.end, !dbg !29
> > ```
> > 
> > IR after first instcombine :
> > 
> > opt llvm/test/DebugInfo/X86/sroa-after-inlining2.ll -sroa -instcombine -S -o -
> > 
> > 
> > ```
> > entry:
> >   %result = alloca i64, align 8
> >   %tmpcast = bitcast i64* %result to %struct.S1*
> >   %0 = bitcast i64* %result to i8*, !dbg !24
> >   call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %0), !dbg !24
> >   %call = call i64 @_Z3foov(), !dbg !25
> >   store i64 %call, i64* %result, align 8, !dbg !25
> >   call void @llvm.dbg.value(metadata %struct.S1* %tmpcast, metadata !12, metadata !DIExpression(DW_OP_deref)), !dbg !26 <<<<<<<<<<<<
> >   %call1 = call zeroext i1 @_ZN2S16IsNullEv(%struct.S1* nonnull %tmpcast) #3, !dbg !27
> >   br i1 %call1, label %if.then, label %if.end, !dbg !29
> > ```
> > 
> > Note dbg.declare for the result variable(!12) converted into dbg.value.
> > 
> > IR after second SROA :
> > 
> > opt llvm/test/DebugInfo/X86/sroa-after-inlining2.ll -sroa -instcombine -inline -instcombine -sroa -S -o -
> > 
> > 
> > ```
> > define dso_local i32 @_Z3barv() #0 !dbg !7 {
> > entry:
> >   %call = call i64 @_Z3foov(), !dbg !24
> >   %result.sroa.0.0.extract.trunc = trunc i64 %call to i32, !dbg !24
> >   %result.sroa.6.0.extract.shift = lshr i64 %call, 32, !dbg !24
> >   %result.sroa.6.0.extract.trunc = trunc i64 %result.sroa.6.0.extract.shift to i32, !dbg !24
> >   call void @llvm.dbg.value(metadata i64* undef, metadata !12, metadata !DIExpression(DW_OP_deref)), !dbg !25   <<<<<<<<<<
> >   call void @llvm.dbg.value(metadata i64* undef, metadata !26, metadata !DIExpression()), !dbg !30
> >   %cmp.i = icmp eq i32 %result.sroa.0.0.extract.trunc, 0, !dbg !33
> >   br i1 %cmp.i, label %if.then, label %if.end, !dbg !34
> > ```
> > 
> > 
> > Note dbg.value with undef memory location for result variable. Instead of setting to undef value it should be set into SROA variable : 
> > 
> > 
> > ```
> > entry:
> >   %call = call i64 @_Z3foov(), !dbg !24
> >   %result.sroa.0.0.extract.trunc = trunc i64 %call to i32, !dbg !24
> >   call void @llvm.dbg.value(metadata i32 %result.sroa.0.0.extract.trunc, metadata !12, metadata !DIExpression(DW_OP_LLVM_fragment, 0, 32)), !dbg !25   <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> >   %result.sroa.6.0.extract.shift = lshr i64 %call, 32, !dbg !24
> >   %result.sroa.6.0.extract.trunc = trunc i64 %result.sroa.6.0.extract.shift to i32, !dbg !24
> >   call void @llvm.dbg.value(metadata i32 %result.sroa.6.0.extract.trunc, metadata !12, metadata !DIExpression(DW_OP_LLVM_fragment, 32, 32)), !dbg !25  <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> >   call void @llvm.dbg.value(metadata i64* undef, metadata !26, metadata !DIExpression()), !dbg !30
> >   %cmp.i = icmp eq i32 %result.sroa.0.0.extract.trunc, 0, !dbg !33
> >   br i1 %cmp.i, label %if.then, label %if.end, !dbg !34
> > ```
> >  
> > That problem could be fixed in following ways : 
> > 
> > a) teach SROA and mem2reg to work with dbg.value. So that following :
> > 
> > call void @llvm.dbg.value(metadata %struct.S1* %tmpcast, metadata !12, metadata !DIExpression(DW_OP_deref))
> > 
> > converted into : 
> > 
> >   call void @llvm.dbg.value(metadata i32 %result.sroa.0.0.extract.trunc, metadata !12, metadata !DIExpression(DW_OP_LLVM_fragment, 0, 32)), !dbg !25   
> >   call void @llvm.dbg.value(metadata i32 %result.sroa.6.0.extract.trunc, metadata !12, metadata !DIExpression(DW_OP_LLVM_fragment, 32, 32)), !dbg !25
> > 
> > b) do not do dbg.declare lowering for structures. 
> > 
> > Last version of the patch implements not doing dbg.declare lowering for structures which was suggested by @jmorse . 
> > 
> > LowerDbgDeclare has that comment [0]: 
> > 
> > 
> > ```
> >     // If this is an alloca for a scalar variable, insert a dbg.value
> >     // at each load and store to the alloca and erase the dbg.declare.
> >     // The dbg.values allow tracking a variable even if it is not
> >     // stored on the stack, while the dbg.declare can only describe
> >     // the stack slot (and at a lexical-scope granularity). Later
> >     // passes will attempt to elide the stack slot. 
> > ```
> > 
> > Structures would either be converted into scalars(by SROA) and then they could be lowered and able to be allocated not on the stack(thus need dbg.value), 
> > either they do not need to be lowered and are correctly described by dbg.declare.
> > So it looks OK to avoid lowering of dbg.declare for structures.
> > 
> > //=============================================================================
> > For the point 2:  
> >   
> > If lowering of dbg.declare for structures avoided then there generated two dbg.declare`s for the same variable which is incorrect :
> > 
> > before second instcombine pass there is following IR :
> > 
> > opt sroa-after-inlining2.ll -sroa -instcombine -inline -S -o -
> > 
> > ```
> > entry:
> >   %result = alloca i64, align 8
> >   %tmpcast = bitcast i64* %result to %struct.S1*     <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> >   %0 = bitcast i64* %result to i8*, !dbg !24
> >   call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %0), !dbg !24
> >   call void @llvm.dbg.declare(metadata %struct.S1* %tmpcast, metadata !12, metadata !DIExpression()), !dbg !25   <<<<<<<<<<
> > 
> > ```
> > 
> > after second instcombine: 
> > 
> > opt sroa-after-inlining2.ll -sroa -instcombine -inline -instcombine  -S -o -
> > 
> > ```
> > entry:
> >   %result = alloca i64, align 8
> >   %0 = bitcast i64* %result to i8*, !dbg !24
> >   call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %0), !dbg !24
> >   call void @llvm.dbg.declare(metadata i64* %result, metadata !12, metadata !DIExpression()), !dbg !25   <<<<<<<<<<<<
> > 
> > 
> > if.end:                                           ; preds = %entry
> >   %tmpcast = bitcast i64* %result to %struct.S1*   <<<<<<<<<<<<<<<<<<<<<<<<<
> >   call void @llvm.dbg.declare(metadata %struct.S1* %tmpcast, metadata !12, metadata !DIExpression()), !dbg !25   <<<<<<<<<<
> > 
> > ```
> > That second dbg.declare created in instcombine when it moves instructions  [1]. In this case the bitcast instruction was moved down. Since moved instruction is not an alloca then no need to copy dbg.declare to avoid duplication.  
> > So it seems proper fix would be to not clone dbg.declare if moved instruction is not an alloca.
> > i.e. to achieve that state :" single dbg.declare would be sufficient" the cloning of dbg.declare should be avoided.
> > 
> > //=============================================================================
> > 3. generating dbg.addr in LowerDbgDeclare.
> > 
> >    yes, I think it would be good to teach LowerDbgDeclare to generate dbg.addr. But it seems to me now that it is not relevant for this concrete case. 
> > 
> > All above in short : 
> > 
> > 1. avoiding  dbg.declare lowering for structures solves original problem of the patch. It allows to generate correct  dbg.values after SROA pass.
> > 2. avoiding duplication of dbg.declare in instcombiner while moving instructions prevents several dbg.declare instrs generation.
> > 3. Improving LowerDbgDeclare to generate dbg.addr intrinsics looks like necessary thing. But not related to the problem of this patch directly.
> > 
> > 
> > [0] https://github.com/llvm/llvm-project/blob/llvmorg-9.0.0-rc2/llvm/lib/Transforms/Utils/Local.cpp#L1419
> > [1] https://github.com/llvm/llvm-project/blob/llvmorg-9.0.0-rc2/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp#L3162
> ```
> entry:
>   %result = alloca i64, align 8
>   %tmpcast = bitcast i64* %result to %struct.S1*
>   %0 = bitcast i64* %result to i8*, !dbg !24
>   call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %0), !dbg !24
>   %call = call i64 @_Z3foov(), !dbg !25
>   store i64 %call, i64* %result, align 8, !dbg !25
>   call void @llvm.dbg.value(metadata %struct.S1* %tmpcast, metadata !12, metadata !DIExpression(DW_OP_deref)), !dbg !26 <<<<<<<<<<<<
>   %call1 = call zeroext i1 @_ZN2S16IsNullEv(%struct.S1* nonnull %tmpcast) #3, !dbg !27
>   br i1 %call1, label %if.then, label %if.end, !dbg !29
> ```
> 
> Who is deleting the dbg.declare and inserting the dbg.value? Does instcombine call LowerDbgDeclare, or is this implemented manually somewhere?
> 
> > IR after second SROA :
> 
> and the problem ehere is that SROA only knows/cares about allocas/dbg.declares and thus doesn't update the dbg.value. So by cloning the dbg.declare in this patch, you make it work. I also image that if you ran SROA a third time, it might run into the same issue again.
> 
> Would it be feasible to teach SROA to also update dbg.values instead, or is this a loosing battle? I would imagine that when SROA is rewriting 
> 
> ```
> store i64 %call, i64* %result, align 8, !dbg !25
> call void @llvm.dbg.value(metadata %struct.S1* %tmpcast, metadata !12, metadata !DIExpression(DW_OP_deref)), !dbg !26
> ```
> 
> into `%result.sroa.6.0.extract` (this intermediate step isn't shown in your example, but I'm sure it must exist in the detailed SROA log) we could update the dbg.value to point to %result.sroa.6.0.extract and attach a DW_OP_fragment, like we do for dbg.declares. Then salvageDebugInfo should automatically update the throughout the remaining transformations. (A reference to an alloca and a bitcast of an alloca inside a dbg.value can be treated as equivalent.)
> 
> Have you investigated this option? I think this would be a much cleaner and more general solution that should give even better results than this patch.
> 
>>entry:
>> %result = alloca i64, align 8
>> %tmpcast = bitcast i64* %result to %struct.S1*
>> %0 = bitcast i64* %result to i8*, !dbg !24
>>call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %0), !dbg !24
>>%call = call i64 @_Z3foov(), !dbg !25
>>store i64 %call, i64* %result, align 8, !dbg !25
>>call void @llvm.dbg.value(metadata %struct.S1* %tmpcast, metadata !12, metadata !DIExpression(DW_OP_deref)), !dbg !26 <<<<<<<<<<<<
>>  %call1 = call zeroext i1 @_ZN2S16IsNullEv(%struct.S1* nonnull %tmpcast) #3, !dbg !27
>>  br i1 %call1, label %if.then, label %if.end, !dbg !29

>Who is deleting the dbg.declare and inserting the dbg.value? Does instcombine call LowerDbgDeclare, or is this implemented manually >somewhere?

yes, it does, instcombine calls LowerDbgDeclare. LowerDbgDeclare deletes dbg.declare and inserts dbg.value.

>>    IR after second SROA :

>and the problem ehere is that SROA only knows/cares about allocas/dbg.declares and thus doesn't update the dbg.value. So by cloning the >dbg.declare in this patch, you make it work. I also image that if you ran SROA a third time, it might run into the same issue again.

Not by cloning dbg.declare. This patch makes SROA(which only knows/cares about allocas/dbg.declares) to work by avoiding lowering dbg.declare.  instcombine calls LowerDbgDeclare but it does nothing for structures.

No problem with calling SROA third time. Structures will always be described by dbg.declare(LowerDbgDeclare does nothing for structures in this patch). If structure would be converted by previous SROA into an scalar then this scalar would not be subject for next SROA.
Thus it is OK to call SROA several times : all input structures would be described by dbg.declare. There would not be situation similar to current moment - when structure could be described by dbg.declare and by dbg.value depending on previous passes of SROA.

>Would it be feasible to teach SROA to also update dbg.values instead, or is this a loosing battle? I would imagine that when SROA is rewriting
>...
>Have you investigated this option? I think this would be a much cleaner and more general solution that should give even better results than this patch.

First implementation of this patch was a kind of such solution(not exactly this, but it in that direction).
Now I think that avoiding dbg.declare lowering for structures is a better option. 
Here are  the arguments :

1. Implementation becomes more complex without any additional benefit. It would be necessary to implement two similar processing for different representations : mem2reg for dbg.declare and patching dbg.value. To recognize proper dbg.value it would be necessary to parse DIExpression for dbg.value(For most cases it could be just check for DW_OP_deref and structure type, but in general - there could be much more complex expressions) . 

2. No reason to lowering structures. LowerDbgDeclare handles scalars since they could be allocated on registers and then could not have stack location. If structure fits in register then it would be converted into a scalar first and then it could be lowered. Otherwise no need to lower structure.

3. arrays already done in that way - https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Utils/Local.cpp#L1406

4. as far as I understand current design decision for dbg.addr: if object located on the stack then it should be described by dbg.declare or dbg.addr. Otherwise dbg.value should be used. dbg.value pointing to an alloca should be avoided.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64595/new/

https://reviews.llvm.org/D64595