[Lldb-commits] [PATCH] Allow MachO JIT debugging

Greg Clayton gclayton at apple.com
Thu Jun 5 10:38:02 PDT 2014


> On Jun 5, 2014, at 9:47 AM, Keno Fischer <kfischer at college.harvard.edu> wrote:
> 
> > - What is updateSectionLoadAddress(...) doing when it checks "if (section_sp->GetFileAddress() > 0x100000)"?
> 
> LLVM allocates sections with relocations outside of the actual symbol file and then updates the section vmaddr accordingly. What this code does is basically traverse through the section tree and for every leaf section adjust the load address accordingly. The only problem is that LLVm doesn't actually relocate all sections, so we have to have some kind of check to determine whether the section was relocated or not. The condition in there right now is a stop gap and I'd like to come up with something more reasonable (I meant to ask about that in the initial review). I think some sort of comparison to the file size would be appropriate, but I don't know enough about Mach O object files to know about the relation of vmaddr and file offset. Any ideas?

The file size can be zero for BSS sections, so the file size doesn't necessarily correlate with the vmsize. What is the file type of the mach-o file? In the mach header there is a "filetype" field. I have attached a raw macho dump of the load commands for the "swig" executable:


% mach_o.py `which swig`
0x00000000: /usr/local/bin/swig (x86_64)
Mach Header
       magic: 0xfeedfacf MH_MAGIC_64
     cputype: 0x01000007 x86_64
  cpusubtype: 0x80000003
    filetype: 0x00000002 MH_EXECUTE
       ncmds: 0x00000012 18
  sizeofcmds: 0x000007d0
       flags: 0x00210085 MH_NOUNDEFS | MH_DYLDLINK | MH_TWOLEVEL | MH_BINDS_TO_WEAK | MH_PIE 

                                             VMADDR             VMSIZE             FILEOFF            FILESIZE           PROTECT   
0x00000020: <0x0048> LC_SEGMENT_64           0x0000000000000000 0x0000000100000000 0x0000000000000000 0x0000000000000000 --- ---   0 0x00000000 __PAGEZERO
0x00000068: <0x02c8> LC_SEGMENT_64           0x0000000100000000 0x00000000000ed000 0x0000000000000000 0x00000000000ed000 rwx r-x   8 0x00000000 __TEXT
0x00000330: <0x02c8> LC_SEGMENT_64           0x00000001000ed000 0x0000000000009000 0x00000000000ed000 0x0000000000005000 rwx rw-   8 0x00000000 __DATA
0x000005f8: <0x0048> LC_SEGMENT_64           0x00000001000f6000 0x0000000000006000 0x00000000000f2000 0x0000000000004380 rwx r--   0 0x00000000 __LINKEDIT
0x00000640: <0x0030> LC_DYLD_INFO_ONLY       rebase_off = 0x000f2000, rebase_size = 216, bind_off = 0x000f20d8, bind_size = 400, weak_bind_off = 0x000f2268, weak_bind_size = 48, lazy_bind_off = 0x000f2298, lazy_bind_size = 1120, export_off = 0x000f26f8, export_size = 32, 
0x00000670: <0x0018> LC_SYMTAB               symoff = 0x000f30d8, nsyms = 82, stroff = 0x000f3858, strsize = 944
0x00000688: <0x0050> LC_DYSYMTAB             ilocalsym      = 0         , nlocalsym     = 1
                                             iextdefsym     = 1         , nextdefsym    = 1
                                             iundefsym      = 2         , nundefsym     = 80
                                             tocoff         = 0x00000000, ntoc          = 0
                                             modtaboff      = 0x00000000, nmodtab       = 0
                                             extrefsymoff   = 0x00000000, nextrefsyms   = 0
                                             indirectsymoff = 0x000f35f8, nindirectsyms = 152
                                             extreloff      = 0x00000000, nextrel       = 0
                                             locreloff      = 0x00000000, nlocrel       = 0
0x000006d8: <0x0020> LC_LOAD_DYLINKER        /usr/lib/dyld
0x000006f8: <0x0018> LC_UUID                 f0c6b9ae-2ab8-3305-9746-f7275e37cc94
0x00000710: <0x0010> LC_VERSION_MIN_MACOSX   
0x00000720: <0x0010> 0x0000002a              
0x00000730: <0x0018> 0x80000028              
0x00000748: <0x0030> LC_LOAD_DYLIB           0x00000002 0x00780000 0x00010000 /usr/lib/libc++.1.dylib
0x00000778: <0x0038> LC_LOAD_DYLIB           0x00000002 0x04bc0000 0x00010000 /usr/lib/libSystem.B.dylib
0x000007b0: <0x0010> LC_FUNCTION_STARTS      dataoff = 0x000f2718, datasize = 2464
0x000007c0: <0x0010> 0x00000029              
0x000007d0: <0x0010> 0x0000002b              
0x000007e0: <0x0010> LC_CODE_SIGNATURE       dataoff = 0x000f3c10, datasize = 10096


And the sections look like:

INDEX ADDRESS            SIZE               OFFSET     ALIGN      RELOFF     NRELOC     FLAGS      RESERVED1  RESERVED2  RESERVED3  NAME
===== ------------------ ------------------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------------------
[  1] 0x0000000100000e10 0x00000000000c1852 0x00000e10 0x00000004 0x00000000 0x00000000 0x80000400 0x00000000 0x00000000 0x00000000 __TEXT.__text
[  2] 0x00000001000c2662 0x00000000000001aa 0x000c2662 0x00000001 0x00000000 0x00000000 0x80000408 0x00000000 0x00000006 0x00000000 __TEXT.__stubs
[  3] 0x00000001000c280c 0x00000000000002d6 0x000c280c 0x00000002 0x00000000 0x00000000 0x80000400 0x00000000 0x00000000 0x00000000 __TEXT.__stub_helper
[  4] 0x00000001000c2af0 0x0000000000023288 0x000c2af0 0x00000004 0x00000000 0x00000000 0x00000002 0x00000000 0x00000000 0x00000000 __TEXT.__cstring
[  5] 0x00000001000e5d80 0x00000000000054a0 0x000e5d80 0x00000004 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 __TEXT.__const
[  6] 0x00000001000eb220 0x0000000000000128 0x000eb220 0x00000004 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 __TEXT.__ustring
[  7] 0x00000001000eb348 0x0000000000000ac4 0x000eb348 0x00000002 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 __TEXT.__gcc_except_tab
[  8] 0x00000001000ebe0c 0x00000000000011f0 0x000ebe0c 0x00000002 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 __TEXT.__unwind_info
[  9] 0x00000001000ed000 0x0000000000000040 0x000ed000 0x00000003 0x00000000 0x00000000 0x00000006 0x00000047 0x00000000 0x00000000 __DATA.__got
[ 10] 0x00000001000ed040 0x0000000000000010 0x000ed040 0x00000003 0x00000000 0x00000000 0x00000006 0x0000004f 0x00000000 0x00000000 __DATA.__nl_symbol_ptr
[ 11] 0x00000001000ed050 0x0000000000000238 0x000ed050 0x00000003 0x00000000 0x00000000 0x00000007 0x00000051 0x00000000 0x00000000 __DATA.__la_symbol_ptr
[ 12] 0x00000001000ed288 0x0000000000000030 0x000ed288 0x00000003 0x00000000 0x00000000 0x00000009 0x00000000 0x00000000 0x00000000 __DATA.__mod_init_func
[ 13] 0x00000001000ed2c0 0x0000000000001908 0x000ed2c0 0x00000004 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 __DATA.__const
[ 14] 0x00000001000eebd0 0x0000000000002fd4 0x000eebd0 0x00000004 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 __DATA.__data
[ 15] 0x00000001000f1bb0 0x00000000000000f8 0x00000000 0x00000004 0x00000000 0x00000000 0x00000001 0x00000000 0x00000000 0x00000000 __DATA.__common
[ 16] 0x00000001000f1cb0 0x0000000000003860 0x00000000 0x00000004 0x00000000 0x00000000 0x00000001 0x00000000 0x00000000 0x00000000 __DATA.__bss

Not sure if this helps you see anything?

> 
> > - Why are we preloading everything with the code: ...
> 
> Yes, you're right, that was for debugging and slipped past my cleanup.

Ah, phew!

> 
> > - Your fix to ObjectFileMachO.cpp is not correct...
> 
> The code in if (process) doesn't do anything if we don't have a linkedit_section_sp. Maybe we need to duplicate that code in an else block for linkedit_section_sp ... 

I am thinking we need to fix the MachO file producer in llvm/clang to make a __LINKEDIT segment. The __LINKEDIT segment contains anything that isn't in any other section that isn't needed for running. It is just a bunch if linker bits like the symbol table, string table, compact unwind info and more. So all bits in a mach-o file must be spoken for and must be in a segment. The mach-o file starts with a bunch of load commands (as you can see above in the swig dump). 

The LC_SYMTAB load command contains information about the symbol table and it contains:

0x00000670: <0x0018> LC_SYMTAB               symoff = 0x000f30d8, nsyms = 82, stroff = 0x000f3858, strsize = 944

This tells us the symbol table offset in the file (offset from the start of the mach header) and the size, and the string table offset + size. The symbol table and string table should be in a __LINKEDIT segment.

Note there is other load commands that point to data in the __LINKEDIT segment:

0x00000640: <0x0030> LC_DYLD_INFO_ONLY       rebase_off = 0x000f2000, rebase_size = 216, bind_off = 0x000f20d8, bind_size = 400, weak_bind_off = 0x000f2268, weak_bind_size = 48, lazy_bind_off = 0x000f2298, lazy_bind_size = 1120, export_off = 0x000f26f8, export_size = 32, 
0x00000688: <0x0050> LC_DYSYMTAB             ilocalsym      = 0         , nlocalsym     = 1
                                             iextdefsym     = 1         , nextdefsym    = 1
                                             iundefsym      = 2         , nundefsym     = 80
                                             tocoff         = 0x00000000, ntoc          = 0
                                             modtaboff      = 0x00000000, nmodtab       = 0
                                             extrefsymoff   = 0x00000000, nextrefsyms   = 0
                                             indirectsymoff = 0x000f35f8, nindirectsyms = 152
                                             extreloff      = 0x00000000, nextrel       = 0
                                             locreloff      = 0x00000000, nlocrel       = 0

0x000007b0: <0x0010> LC_FUNCTION_STARTS      dataoff = 0x000f2718, datasize = 2464
0x000007e0: <0x0010> LC_CODE_SIGNATURE       dataoff = 0x000f3c10, datasize = 10096

> Thank you for your comments. I'm learning as I'm going here.

No worries I can definitely help out with getting this ready, a few more iterations and we should be good.
> 
> Keno
> 
> 
> 
> On Thu, Jun 5, 2014 at 12:37 PM, Greg Clayton <gclayton at apple.com> wrote:
> Can you explain a few things?:
> 
> - What is updateSectionLoadAddress(...) doing when it checks "if (section_sp->GetFileAddress() > 0x100000)"?
> - Why are we preloading everything with the code:
> 
>                  // load the symbol table right away
>                 module_sp->GetObjectFile()->GetSymtab();
> 
>                 module_sp->GetSymbolVendor()->GetNumCompileUnits();
>                 module_sp->GetSymbolVendor()->GetCompileUnitAtIndex(0);
>                 module_sp->ParseAllDebugSymbols();
> 
> This seems like we should just let it load things lazily. Parsing all debug symbols is not advised, it should be allowed to lazily parse the DWARF as it needs to.
> 
> - Your fix to ObjectFileMachO.cpp is not correct. If we have a process, then we load the symbol table from memory (the code in the "if (process)"), else we load it from the load commands (in the "else") and from the file itself. We don't want to always load the symbol table from the load commands as the symtab_load_command.symoff and symtab_load_command.stroff are not correct when a mach-o file is being read from memory.
> 
> 
> 
> > On Jun 3, 2014, at 9:46 AM, Keno Fischer <kfischer at college.harvard.edu> wrote:
> >
> > This is the LLDB side of http://reviews.llvm.org/D4005
> >
> > http://reviews.llvm.org/D4006
> >
> > Files:
> >  lib/Makefile
> >  source/Core/Section.cpp
> >  source/Plugins/JITLoader/GDB/JITLoaderGDB.cpp
> >  source/Plugins/Makefile
> >  source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp
> > <D4006.10055.patch>_______________________________________________
> > lldb-commits mailing list
> > lldb-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits
> 
> 




More information about the lldb-commits mailing list