[PATCH] LowerDbgDeclare - get rid of redundant dbg.values when an alloca survives optimization

Fri Oct 18 13:46:45 PDT 2013

On Oct 18, 2013, at 13:27, David Blaikie <dblaikie at gmail.com> wrote:

> 
> 
> 
> On Fri, Oct 18, 2013 at 1:15 PM, Adrian Prantl <aprantl at apple.com> wrote:
> 
> On Oct 18, 2013, at 11:26 AM, David Blaikie <dblaikie at gmail.com> wrote:
> 
> >
> >
> >
> > On Fri, Oct 18, 2013 at 11:22 AM, Adrian Prantl <aprantl at apple.com> wrote:
> >
> > On Oct 18, 2013, at 10:35, Adrian Prantl <aprantl at apple.com> wrote:
> >
> > >
> > > On Oct 18, 2013, at 10:24, David Blaikie <dblaikie at gmail.com> wrote:
> > >
> > >>
> > >>
> > >>
> > >> On Fri, Oct 18, 2013 at 10:12 AM, Adrian Prantl <aprantl at apple.com> wrote:
> > >> Any final verdict for this change?
> > >>
> > >> I'm not really familiar with this stuff to provide a 'final verdict' as it were, but I'll ask some questions anyway...
> > >>
> > >> Is this the correct test? Is it possible that the variable might be reconstituted around the call, rather than kept live for the entire range?
> > >> If there are things that prevent the alloca from being elided, are those things enshrined in a test somewhere that we could reuse, rather than reimplementing one part of them?
> > >
> > > That’s a very good point indeed: I just discovered
> > >> bool llvm::isAllocaPromotable(const AllocaInst *AI)
> > > in mem2reg.
> >
> > Yes that seems to work fine and is a lot cleaner.
> >
> > Great. So I'm reading your comment in the code and I'm curious - you say it reduces the number of location DIEs... I guess you mean entries in the location list? (in the debug_loc section) which aren't really DIEs?
> 
> No, they are. The problem is that each dbg.value ends up in a different lexical scope and is therefore emitted as a separate location (vs. just another entry in one location list). See the testcase in the patch.
> >
> > Why is that? What kind of locations were we emitting? Should the DWARF emission code have been doing a better job coalescing these locations if the variable was in the alloca all along anyway?
> >
> I thought about this, too. Because they are in different scopes I think that would be wrong.
> 
> If the locations are contiguous and refer to the same location, they should be coalesced - are they not? (I assume not, and that it's the alloca that stays live over the BB boundary, but in each block the variable is temporarily stored in a register - which may be a different register on each block (and, yes, at the start of the loop we might see the variable in the same register in the loop entry, then the loop body but we can't coalesce those because before the first instruction of the loop we might've entered the loop from some other block where the variable wasn't stored in the same register))
>  
> 
> > Or are we dealing with a situation where the alloca exists, but the variable is temporarily in registers and without your change we described each of those register locations, but now we just describe the location as the alloca?
> >
> > Is this going to improve or hurt the debugging experience much? What happens if the variable is in a register when the user goes to print it out and instead they get the stale value that's still in the alloca?
> >
> At least with the example that I was looking at, it does't hurt. Remember that this is an alloca that cannot be lowered, so the location will have the correct value at least before and after each function call, which is incidentally also the use of the alloca.
> 
> Sure, but what about simple code like:
> 
> int i = 3;
> i = 7;
> i = f1();
> f2(&i);
> 
> Stepping over those lines it would seem problematic if the user can't print out 'i' and observe 3, 7, etc.

At least in this particular example. I don’t see a problem. At -O0, all the values will be stored into the alloca anyway.

	.loc	1 5 0 prologue_end      ## test1.c:5:0
Ltmp5:
	subq	$16, %rsp
	movl	$3, -8(%rbp)
	.loc	1 6 0                   ## test1.c:6:0
	movl	$7, -8(%rbp)
	.loc	1 7 0                   ## test1.c:7:0
	movb	$0, %al
	callq	_f1
	leaq	-8(%rbp), %rdi
	movl	%eax, -8(%rbp)
	.loc	1 8 0                   ## test1.c:8:0
	callq	_f2
	.loc	1 9 0                   ## test1.c:9:0
	movl	-4(%rbp), %eax
	addq	$16, %rsp
	popq	%rbp
	ret

at -O2, we are missing out on the 3 and the 7, but we would anyway, because this is dead code.

	.loc	1 7 0 prologue_end      ## test1.c:7:0
Ltmp5:
	callq	_f1
	movl	%eax, -4(%rbp)
	leaq	-4(%rbp), %rdi
	.loc	1 8 0                   ## test1.c:8:0
	callq	_f2
	.loc	1 9 0                   ## test1.c:9:0
	addq	$16, %rsp
	popq	%rbp
	ret

Aha. But if we extend the program like this:
int foo() {
  int i = 3;
  f3(i);
  i = 7;
  i = f1();
  f2(&i);
}

We are actually missing the 3, too.

	.cfi_def_cfa_register %rbp
	subq	$16, %rsp
	movl	$3, %edi
	.loc	1 7 0 prologue_end      ## test1.c:7:0
Ltmp5:
	callq	_f3
	xorl	%eax, %eax
	.loc	1 9 0                   ## test1.c:9:0
	callq	_f1
	movl	%eax, -4(%rbp)
	leaq	-4(%rbp), %rdi
	.loc	1 10 0                  ## test1.c:10:0
	callq	_f2
	.loc	1 11 0                  ## test1.c:11:0
	addq	$16, %rsp
	popq	%rbp
	ret

And now it gets really strange: Without my patch, the dbg.values show up in the asm output,

	subq	$16, %rsp
Ltmp5:
	##DEBUG_VALUE: foo:i <- 3+0
	movl	$3, %edi
	.loc	1 7 0 prologue_end      ## test1.c:7:0
Ltmp6:
	callq	_f3
Ltmp7:
	##DEBUG_VALUE: foo:i <- 7+0
	xorb	%al, %al
	.loc	1 9 0                   ## test1.c:9:0
	callq	_f1
Ltmp8:
	##DEBUG_VALUE: foo:i <- EAX+0
	movl	%eax, -4(%rbp)
	leaq	-4(%rbp), %rdi
	.loc	1 10 0                  ## test1.c:10:0
	callq	_f2
	.loc	1 11 0                  ## test1.c:11:0
                                        ## implicit-def: EAX
Ltmp9:
	addq	$16, %rsp
	popq	%rbp
	ret

But not in the DWARF!?

Looks like we have two problems now.

-- adrian