[PATCH] D64595: [Debuginfo][SROA] Need to handle dbg.value in SROA pass.

Tue Aug 20 06:52:08 PDT 2019

avl marked an inline comment as done.
avl added inline comments.

================
Comment at: llvm/lib/Transforms/InstCombine/InstructionCombining.cpp:3163-3164
     if (DII->getParent() == SrcBlock) {
-      // dbg.value is in the same basic block as the sunk inst, see if we can
-      // salvage it. Clone a new copy of the instruction: on success we need
-      // both salvaged and unsalvaged copies.
-      SmallVector<DbgVariableIntrinsic *, 1> TmpUser{
-          cast<DbgVariableIntrinsic>(DII->clone())};
-
-      if (!salvageDebugInfoForDbgValues(*I, TmpUser)) {
-        // We are unable to salvage: sink the cloned dbg.value, and mark the
-        // original as undef, terminating any earlier variable location.
-        LLVM_DEBUG(dbgs() << "SINK: " << *DII << '\n');
-        TmpUser[0]->insertBefore(&*InsertPos);
-        Value *Undef = UndefValue::get(I->getType());
-        DII->setOperand(0, MetadataAsValue::get(DII->getContext(),
-                                                ValueAsMetadata::get(Undef)));
+      if (DII->getIntrinsicID() == Intrinsic::dbg_declare &&
+          !isa<AllocaInst>(I)) {
+        // There should be only one dbg.declare instruction. Thus we could not
----------------
aprantl wrote:
> avl wrote:
> > rnk wrote:
> > > aprantl wrote:
> > > > jmorse wrote:
> > > > > Interesting -- are there circumstances where this happens (dbg.declare of something that isn't an alloca)? I suspect it would be IR that didn't make sense, we might be better off disallowing it in the IR Verifier instead.
> > > > I must admit I'm confused how this patch relates to the example in the review description. What is the instruction coming in here in the example?
> > > clang emits dbg.declare of things that are not allocas from time to time. Usually it is for C++ classes that, when passed by value at the source level, are instead passed indirectly by pointer as required by the ABI. Or, for POD structs passed using byval.
> > > 
> > > Also, after inlining, these arguments may resolve to pointers to non-alloca locations, as in `this->field = getSRetThing();`.
> > > 
> > That happens with this patch and llvm/test/DebugInfo/X86/sroa-after-inlining2.ll testcase if InstructionCombining.cpp:3163-3173 piece deleted.
> > 
> > opt sroa-after-inlining2.ll -sroa -instcombine -inline -instcombine -sroa -S -o -
> > 
> > before second instcombine pass there is following IR : 
> > 
> > 
> > ```
> > entry:
> >   %result = alloca i64, align 8
> >   %tmpcast = bitcast i64* %result to %struct.S1*     <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> >   %0 = bitcast i64* %result to i8*, !dbg !24
> >   call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %0), !dbg !24
> >   call void @llvm.dbg.declare(metadata %struct.S1* %tmpcast, metadata !12, metadata !DIExpression()), !dbg !25   <<<<<<<<<<
> > ```
> > 
> > after second instcombine pass it transformed into : 
> > 
> > 
> > ```
> > entry:
> >   %result = alloca i64, align 8
> >   %0 = bitcast i64* %result to i8*, !dbg !24
> >   call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %0), !dbg !24
> >   call void @llvm.dbg.declare(metadata i64* %result, metadata !12, metadata !DIExpression()), !dbg !25
> > 
> > 
> > if.end:                                           ; preds = %entry
> >   %tmpcast = bitcast i64* %result to %struct.S1*   <<<<<<<<<<<<<<<<<<<<<<<<<
> >   call void @llvm.dbg.declare(metadata %struct.S1* %tmpcast, metadata !12, metadata !DIExpression()), !dbg !25   <<<<<<<<<<
> > ```
> > 
> > instcombine moves bitcast instruction and clones dbg.declare.
> (Looks like the `!isa<AllocaInst>(I)` allows you to detect the bitcast-of-alloc case.)
> 
> I'm not quite convinced by this example, but maybe there is something I'm missing, let me know!
> 
> The semantics of dbg.declare(alloca, var) are that the alloca is the definitive storage for var. There is no need to insert duplicate dbg.declares later in the code because a dbg.declare is valid throughout the function until the lifetime of alloca ends or var goes out of scope.
> If we need more fine-grained tracking of locations we have dbg.value and dbg.addr at our disposal, which are valid until the next dbg.value/addr for the same (overlapping fragment of the) variable. LowerDbgDeclare is the pass that switches between these two modes by deleting the dbg.declare and inserting a dbg.value at every load from the alloca.
> 
> I suspect that the disconnect between my mental model and what this patch is about comes from that LowerDbgDeclare is good at describing the SSA values loaded from memory, but does a terrible job describing the values stored into alloca, as it is neither inserting a dbg.addr for store instuctions describing the alloca, nor are we inserting dbg.addr when the address is passed to function that takes a pointer aliasing with the variable.
> 
> To reiterate: I think it is a bug to have more than one dbg.declare describing the same variable fragment in the same alloca (see SourceLevelDebugging.rst, llvm.dbg.declare). If we need to duplicate something pointing to an alloca, it should be a dbg.value or dbg.addr. (Similarly, I also think it is a bug to have a dbg.declare and a dbg.value describing the same variable.)
> 
> For the quoted example, a single dbg.declare would be sufficient, assuming there are no other dbg.values for this variable. If there are dbg.values present outside of the quoted region, the dbg.declares should be dbg.value(%result, !DIExpression(DW_OP_deref)) instead.
@aprantl Hi, I am sorry, quite a lot of text later. There are two problems here: 

1. Original problem with incorrect dbg.value if SROA optimization called twice.

2. Problem with cloned dbg.declare because of additional bitcast instruction. 
   That problem becomes visible if lowering for structures avoided.

//=============================================================================
For the point 1:

SROA and mem2reg conversion do not work with dbg.value. They work with only dbg.declare. 
As described in your model: LowerDbgDeclare is the pass that switches from dbg.declare into dbg.value mode. 
But SROA and mem2reg are not ready for that mode. This leads to incorrect(not-patched) dbg.value after second SROA.

IR before first instcombine for non-patched version : 

opt llvm/test/DebugInfo/X86/sroa-after-inlining2.ll -sroa -S -o -

```
entry:
  %result = alloca %struct.S1, align 4
  %0 = bitcast %struct.S1* %result to i8*, !dbg !24
  call void @llvm.lifetime.start.p0i8(i64 8, i8* %0), !dbg !24
  call void @llvm.dbg.declare(metadata %struct.S1* %result, metadata !12, metadata !DIExpression()), !dbg !25    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
  %call = call i64 @_Z3foov(), !dbg !26
  %1 = bitcast %struct.S1* %result to i64*, !dbg !26
  store i64 %call, i64* %1, align 4, !dbg !26
  %call1 = call zeroext i1 @_ZN2S16IsNullEv(%struct.S1* %result), !dbg !27
  br i1 %call1, label %if.then, label %if.end, !dbg !29
```

IR after first instcombine :

opt llvm/test/DebugInfo/X86/sroa-after-inlining2.ll -sroa -instcombine -S -o -

```
entry:
  %result = alloca i64, align 8
  %tmpcast = bitcast i64* %result to %struct.S1*
  %0 = bitcast i64* %result to i8*, !dbg !24
  call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %0), !dbg !24
  %call = call i64 @_Z3foov(), !dbg !25
  store i64 %call, i64* %result, align 8, !dbg !25
  call void @llvm.dbg.value(metadata %struct.S1* %tmpcast, metadata !12, metadata !DIExpression(DW_OP_deref)), !dbg !26 <<<<<<<<<<<<
  %call1 = call zeroext i1 @_ZN2S16IsNullEv(%struct.S1* nonnull %tmpcast) #3, !dbg !27
  br i1 %call1, label %if.then, label %if.end, !dbg !29
```

Note dbg.declare for the result variable(!12) converted into dbg.value.

IR after second SROA :

opt llvm/test/DebugInfo/X86/sroa-after-inlining2.ll -sroa -instcombine -inline -instcombine -sroa -S -o -

```
define dso_local i32 @_Z3barv() #0 !dbg !7 {
entry:
  %call = call i64 @_Z3foov(), !dbg !24
  %result.sroa.0.0.extract.trunc = trunc i64 %call to i32, !dbg !24
  %result.sroa.6.0.extract.shift = lshr i64 %call, 32, !dbg !24
  %result.sroa.6.0.extract.trunc = trunc i64 %result.sroa.6.0.extract.shift to i32, !dbg !24
  call void @llvm.dbg.value(metadata i64* undef, metadata !12, metadata !DIExpression(DW_OP_deref)), !dbg !25   <<<<<<<<<<
  call void @llvm.dbg.value(metadata i64* undef, metadata !26, metadata !DIExpression()), !dbg !30
  %cmp.i = icmp eq i32 %result.sroa.0.0.extract.trunc, 0, !dbg !33
  br i1 %cmp.i, label %if.then, label %if.end, !dbg !34
```

Note dbg.value with undef memory location for result variable. Instead of setting to undef value it should be set into SROA variable : 

```
entry:
  %call = call i64 @_Z3foov(), !dbg !24
  %result.sroa.0.0.extract.trunc = trunc i64 %call to i32, !dbg !24
  call void @llvm.dbg.value(metadata i32 %result.sroa.0.0.extract.trunc, metadata !12, metadata !DIExpression(DW_OP_LLVM_fragment, 0, 32)), !dbg !25   <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
  %result.sroa.6.0.extract.shift = lshr i64 %call, 32, !dbg !24
  %result.sroa.6.0.extract.trunc = trunc i64 %result.sroa.6.0.extract.shift to i32, !dbg !24
  call void @llvm.dbg.value(metadata i32 %result.sroa.6.0.extract.trunc, metadata !12, metadata !DIExpression(DW_OP_LLVM_fragment, 32, 32)), !dbg !25  <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
  call void @llvm.dbg.value(metadata i64* undef, metadata !26, metadata !DIExpression()), !dbg !30
  %cmp.i = icmp eq i32 %result.sroa.0.0.extract.trunc, 0, !dbg !33
  br i1 %cmp.i, label %if.then, label %if.end, !dbg !34
```

That problem could be fixed in following ways : 

a) teach SROA and mem2reg to work with dbg.value. So that following :

call void @llvm.dbg.value(metadata %struct.S1* %tmpcast, metadata !12, metadata !DIExpression(DW_OP_deref))

converted into : 

  call void @llvm.dbg.value(metadata i32 %result.sroa.0.0.extract.trunc, metadata !12, metadata !DIExpression(DW_OP_LLVM_fragment, 0, 32)), !dbg !25   
  call void @llvm.dbg.value(metadata i32 %result.sroa.6.0.extract.trunc, metadata !12, metadata !DIExpression(DW_OP_LLVM_fragment, 32, 32)), !dbg !25

b) do not do dbg.declare lowering for structures. 

Last version of the patch implements not doing dbg.declare lowering for structures which was suggested by @jmorse . 

LowerDbgDeclare has that comment [0]: 

```
    // If this is an alloca for a scalar variable, insert a dbg.value
    // at each load and store to the alloca and erase the dbg.declare.
    // The dbg.values allow tracking a variable even if it is not
    // stored on the stack, while the dbg.declare can only describe
    // the stack slot (and at a lexical-scope granularity). Later
    // passes will attempt to elide the stack slot. 
```

Structures would either be converted into scalars(by SROA) and then they could be lowered and able to be allocated not on the stack(thus need dbg.value), 
either they do not need to be lowered and are correctly described by dbg.declare.
So it looks OK to avoid lowering of dbg.declare for structures.

//=============================================================================
For the point 2:  

If lowering of dbg.declare for structures avoided then there generated two dbg.declare`s for the same variable which is incorrect :

before second instcombine pass there is following IR :

opt sroa-after-inlining2.ll -sroa -instcombine -inline -S -o -

```
entry:
  %result = alloca i64, align 8
  %tmpcast = bitcast i64* %result to %struct.S1*     <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
  %0 = bitcast i64* %result to i8*, !dbg !24
  call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %0), !dbg !24
  call void @llvm.dbg.declare(metadata %struct.S1* %tmpcast, metadata !12, metadata !DIExpression()), !dbg !25   <<<<<<<<<<

```

after second instcombine: 

opt sroa-after-inlining2.ll -sroa -instcombine -inline -instcombine  -S -o -

```
entry:
  %result = alloca i64, align 8
  %0 = bitcast i64* %result to i8*, !dbg !24
  call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %0), !dbg !24
  call void @llvm.dbg.declare(metadata i64* %result, metadata !12, metadata !DIExpression()), !dbg !25   <<<<<<<<<<<<

if.end:                                           ; preds = %entry
  %tmpcast = bitcast i64* %result to %struct.S1*   <<<<<<<<<<<<<<<<<<<<<<<<<
  call void @llvm.dbg.declare(metadata %struct.S1* %tmpcast, metadata !12, metadata !DIExpression()), !dbg !25   <<<<<<<<<<

```
That second dbg.declare created in instcombine when it moves instructions  [1]. In this case the bitcast instruction was moved down. Since moved instruction is not an alloca then no need to copy dbg.declare to avoid duplication.  
So it seems proper fix would be to not clone dbg.declare if moved instruction is not an alloca.
i.e. to achieve that state :" single dbg.declare would be sufficient" the cloning of dbg.declare should be avoided.

//=============================================================================
3. generating dbg.addr in LowerDbgDeclare.

   yes, I think it would be good to teach LowerDbgDeclare to generate dbg.addr. But it seems to me now that it is not relevant for this concrete case. 

All above in short : 

1. avoiding  dbg.declare lowering for structures solves original problem of the patch. It allows to generate correct  dbg.values after SROA pass.
2. avoiding duplication of dbg.declare in instcombiner while moving instructions prevents several dbg.declare instrs generation.
3. Improving LowerDbgDeclare to generate dbg.addr intrinsics looks like necessary thing. But not related to the problem of this patch directly.

[0] https://github.com/llvm/llvm-project/blob/llvmorg-9.0.0-rc2/llvm/lib/Transforms/Utils/Local.cpp#L1419
[1] https://github.com/llvm/llvm-project/blob/llvmorg-9.0.0-rc2/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp#L3162

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64595/new/

https://reviews.llvm.org/D64595