[LLVMdev] Compile units in debugging intrinsics / globals

Richard Smith richard.smith at antixlabs.com
Thu Apr 24 04:41:50 PDT 2008


Hi, thanks for responding. I think I did not explain my problem well. To
illustrate it further, consider these two modules which I will compile and
link together using gcc:

Module 1 is comprised of one source file:

main.c:
  static int a = 1;
  extern int fn1(void);

  int main (int argc, char **argv) {
    return fn1();
  }

I compile this with the command-line

gcc main.c -g -c -o main.o

Module 2 is comprised of three source files:

file1.c:
  #include "file2.h"
  #include "file3.h"
  int fn1(void) {
    return fn2(a);
  }

file2.h:
  static int a = 2;

file3.h:
  int fn2(int p) {
    return p * 2;
  }

I compile this with the command-line

gcc file1.c -g -c -o file1.o

Finally I link the modules

gcc main.o file1.o -o main

In the non-llvm sense, each of these two modules is a compile unit.

To see the debug records I use:

objdump -W main > objdump.gcc.txt

Looking at this file, I see two compile units as I would expect (plus the C
libraries):

Compilation Unit @ offset 0x1a1:
...
<0><1ac>: Abbrev Number: 1 (DW_TAG_compile_unit)
...
DW_AT_name        : main.c
...
<1><208>: Abbrev Number: 2 (DW_TAG_subprogram)
...
DW_AT_name        : main
...
<1><25a>: Abbrev Number: 6 (DW_TAG_variable)
DW_AT_name        : a

And

Compilation Unit @ offset 0x25b:
...
<0><266>: Abbrev Number: 1 (DW_TAG_compile_unit)
...
DW_AT_name        : file1.c
...
<1><2c3>: Abbrev Number: 2 (DW_TAG_subprogram)
...
DW_AT_name        : fn2
...
<1><2f4>: Abbrev Number: 5 (DW_TAG_subprogram)
...
DW_AT_name        : fn1
...
<1><30d>: Abbrev Number: 6 (DW_TAG_variable)
DW_AT_name        : a

The problem I have is that llvm considers a _source file_ to be a compile
unit. My code generator - a back-end I have built for llc - uses the compile
unit information in the llvm *but an llvm compile unit is indistinct from a
source file*. It is true that I _also_ want to know what source file the
declarations are in, but using the information I have, my code generator
erroneously emits debug records for _four_ different compile units: the one
named "main.c" contains the definition of main and variable a, compile unit
"file1.c" contains the definition of fn1, compile unit "file2.h" contains
the definition of variable a and compile unit "file3.h" contains the
definition of fn2.

The problem with using the module information you suggested is that at the
time of code generation the linker has created a single module, and using
this technique you only get _one_ compile unit, which is also wrong. The 2.2
release seems to have this problem. If I compile my sources as follows:

llvm-gcc -c -g main.c -o main.o
llvm-gcc -c -g file1.c -o file1.o
llvm-ld -disable-opt main.o file1.o -o main
llc main.bc -f -o main -march=x86
gcc main.s -o main
objdump -W main > objdump.llvm.txt

I find that the debug records claim that everything is contained in a single
compile unit named "file1.c". I also note that because both of the compile
units contained variables named a, llvm has only emitted one debug record
for such a variable and no matter where I query the value of it when
debugging I always get given the value of the variable in main.c.

As the "standard" code generators get this wrong I suspect the answer is
"no", but what I what I wanted to establish was whether I could determine
the actual compile units (in the non-llvm sense) the debug records were part
of, not simply the source files. It appears that the llvm records are
incorrect in not making a distinction between compile units and source
files, but this could be resolved if there was some way of linking the
source files (llvm compile units) together to determine the modules
(non-llvm compile units).

-- 
Regards,
Richard Smith
Antix Labs Ltd
400 Thames Valley Park Drive, Reading, Berkshire, RG6 1PT
Tel.: +44 (0) 118 357 0 357


-----Original Message-----
From: Duncan Sands [mailto:baldrick at free.fr] 
Sent: 24 April 2008 08:43
To: llvmdev at cs.uiuc.edu
Cc: Richard Smith
Subject: Re: [LLVMdev] Compile units in debugging intrinsics / globals

Hi,

> Suppose I have the following source:
> 
> file1:
>   #include "file2"
>   #include "file3"
>   int fn1(void) ...
> 
> file2:
>   int a;
> 
> file3:
>   int fn2(void) ... 
>  
> then fn1, along with all the base types etc appear to be in compile unit
> "file1", the variable a appears to be in compile unit "file2" (and there
are
> no basic types in file2, so int is not defined), and fn2 appears to be in
> compile unit "file3". My dwarf records are therefore incorrect, appearing
> something like
> 
> TAG_compile_unit "file1"
>   TAG_subprogram "fn1" ...
>     ...
>   TAG_base_type "int" ...
> 
> TAG_compile_init "file2"
>   TAG_variable "a" ...
> 
> TAG_compile_unit "file3"
>   TAG_subprogram "fn2" ...
>     ...
> 
> When, in fact, these compile units "file2" and "file3" are bogus and
> everything should be part of compile_unit "file1".

this is not clear to me.  Isn't it useful to know where to find the
definition of fn2 (in file3)?  I'm pretty sure this is how gcc does
things too: the debugger seems to know that some objects were defined
in header files.

> My question is: can I tell that these three (llvm) compile units are in
fact
> components of the single (non-LLVM) compile unit? Or is there some other
way
> I should be determining which (non-LLVM) compile unit the records are part
> of?

If you compile file1 into an LLVM module M, then by definition all debug
info
in M is for the compile unit file1.  So as long as you're not doing link
time
optimization, can't you just grab all debug info from M?

Ciao,

Duncan.





More information about the llvm-dev mailing list