[Lldb-commits] [PATCH] Allow MachO JIT debugging

Keno Fischer kfischer at college.harvard.edu
Thu Jun 5 10:44:40 PDT 2014


I missed your question about macho file types. These are all MH_OBJECTs.


On Thu, Jun 5, 2014 at 1:43 PM, Keno Fischer <kfischer at college.harvard.edu>
wrote:

> The first issue is tricky, I'll have a look at your dump and play around
> with a few ideas. I'll let you know what I come up with.
>
> > I am thinking we need to fix the MachO file producer in llvm/clang to
> make a __LINKEDIT segment.
>
> I'll see what I can do about that.
>
>
> On Thu, Jun 5, 2014 at 1:38 PM, Greg Clayton <gclayton at apple.com> wrote:
>
>>
>> > On Jun 5, 2014, at 9:47 AM, Keno Fischer <kfischer at college.harvard.edu>
>> wrote:
>> >
>> > > - What is updateSectionLoadAddress(...) doing when it checks "if
>> (section_sp->GetFileAddress() > 0x100000)"?
>> >
>> > LLVM allocates sections with relocations outside of the actual symbol
>> file and then updates the section vmaddr accordingly. What this code does
>> is basically traverse through the section tree and for every leaf section
>> adjust the load address accordingly. The only problem is that LLVm doesn't
>> actually relocate all sections, so we have to have some kind of check to
>> determine whether the section was relocated or not. The condition in there
>> right now is a stop gap and I'd like to come up with something more
>> reasonable (I meant to ask about that in the initial review). I think some
>> sort of comparison to the file size would be appropriate, but I don't know
>> enough about Mach O object files to know about the relation of vmaddr and
>> file offset. Any ideas?
>>
>> The file size can be zero for BSS sections, so the file size doesn't
>> necessarily correlate with the vmsize. What is the file type of the mach-o
>> file? In the mach header there is a "filetype" field. I have attached a raw
>> macho dump of the load commands for the "swig" executable:
>>
>>
>> % mach_o.py `which swig`
>> 0x00000000: /usr/local/bin/swig (x86_64)
>> Mach Header
>>        magic: 0xfeedfacf MH_MAGIC_64
>>      cputype: 0x01000007 x86_64
>>   cpusubtype: 0x80000003
>>     filetype: 0x00000002 MH_EXECUTE
>>        ncmds: 0x00000012 18
>>   sizeofcmds: 0x000007d0
>>        flags: 0x00210085 MH_NOUNDEFS | MH_DYLDLINK | MH_TWOLEVEL |
>> MH_BINDS_TO_WEAK | MH_PIE
>>
>>                                              VMADDR             VMSIZE
>>           FILEOFF            FILESIZE           PROTECT
>> 0x00000020: <0x0048> LC_SEGMENT_64           0x0000000000000000
>> 0x0000000100000000 0x0000000000000000 0x0000000000000000 --- ---   0
>> 0x00000000 __PAGEZERO
>> 0x00000068: <0x02c8> LC_SEGMENT_64           0x0000000100000000
>> 0x00000000000ed000 0x0000000000000000 0x00000000000ed000 rwx r-x   8
>> 0x00000000 __TEXT
>> 0x00000330: <0x02c8> LC_SEGMENT_64           0x00000001000ed000
>> 0x0000000000009000 0x00000000000ed000 0x0000000000005000 rwx rw-   8
>> 0x00000000 __DATA
>> 0x000005f8: <0x0048> LC_SEGMENT_64           0x00000001000f6000
>> 0x0000000000006000 0x00000000000f2000 0x0000000000004380 rwx r--   0
>> 0x00000000 __LINKEDIT
>> 0x00000640: <0x0030> LC_DYLD_INFO_ONLY       rebase_off = 0x000f2000,
>> rebase_size = 216, bind_off = 0x000f20d8, bind_size = 400, weak_bind_off =
>> 0x000f2268, weak_bind_size = 48, lazy_bind_off = 0x000f2298, lazy_bind_size
>> = 1120, export_off = 0x000f26f8, export_size = 32,
>> 0x00000670: <0x0018> LC_SYMTAB               symoff = 0x000f30d8, nsyms =
>> 82, stroff = 0x000f3858, strsize = 944
>> 0x00000688: <0x0050> LC_DYSYMTAB             ilocalsym      = 0         ,
>> nlocalsym     = 1
>>                                              iextdefsym     = 1         ,
>> nextdefsym    = 1
>>                                              iundefsym      = 2         ,
>> nundefsym     = 80
>>                                              tocoff         = 0x00000000,
>> ntoc          = 0
>>                                              modtaboff      = 0x00000000,
>> nmodtab       = 0
>>                                              extrefsymoff   = 0x00000000,
>> nextrefsyms   = 0
>>                                              indirectsymoff = 0x000f35f8,
>> nindirectsyms = 152
>>                                              extreloff      = 0x00000000,
>> nextrel       = 0
>>                                              locreloff      = 0x00000000,
>> nlocrel       = 0
>> 0x000006d8: <0x0020> LC_LOAD_DYLINKER        /usr/lib/dyld
>> 0x000006f8: <0x0018> LC_UUID
>> f0c6b9ae-2ab8-3305-9746-f7275e37cc94
>> 0x00000710: <0x0010> LC_VERSION_MIN_MACOSX
>> 0x00000720: <0x0010> 0x0000002a
>> 0x00000730: <0x0018> 0x80000028
>> 0x00000748: <0x0030> LC_LOAD_DYLIB           0x00000002 0x00780000
>> 0x00010000 /usr/lib/libc++.1.dylib
>> 0x00000778: <0x0038> LC_LOAD_DYLIB           0x00000002 0x04bc0000
>> 0x00010000 /usr/lib/libSystem.B.dylib
>> 0x000007b0: <0x0010> LC_FUNCTION_STARTS      dataoff = 0x000f2718,
>> datasize = 2464
>> 0x000007c0: <0x0010> 0x00000029
>> 0x000007d0: <0x0010> 0x0000002b
>> 0x000007e0: <0x0010> LC_CODE_SIGNATURE       dataoff = 0x000f3c10,
>> datasize = 10096
>>
>>
>> And the sections look like:
>>
>> INDEX ADDRESS            SIZE               OFFSET     ALIGN      RELOFF
>>     NRELOC     FLAGS      RESERVED1  RESERVED2  RESERVED3  NAME
>> ===== ------------------ ------------------ ---------- ----------
>> ---------- ---------- ---------- ---------- ---------- ----------
>> ----------------------
>> [  1] 0x0000000100000e10 0x00000000000c1852 0x00000e10 0x00000004
>> 0x00000000 0x00000000 0x80000400 0x00000000 0x00000000 0x00000000
>> __TEXT.__text
>> [  2] 0x00000001000c2662 0x00000000000001aa 0x000c2662 0x00000001
>> 0x00000000 0x00000000 0x80000408 0x00000000 0x00000006 0x00000000
>> __TEXT.__stubs
>> [  3] 0x00000001000c280c 0x00000000000002d6 0x000c280c 0x00000002
>> 0x00000000 0x00000000 0x80000400 0x00000000 0x00000000 0x00000000
>> __TEXT.__stub_helper
>> [  4] 0x00000001000c2af0 0x0000000000023288 0x000c2af0 0x00000004
>> 0x00000000 0x00000000 0x00000002 0x00000000 0x00000000 0x00000000
>> __TEXT.__cstring
>> [  5] 0x00000001000e5d80 0x00000000000054a0 0x000e5d80 0x00000004
>> 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
>> __TEXT.__const
>> [  6] 0x00000001000eb220 0x0000000000000128 0x000eb220 0x00000004
>> 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
>> __TEXT.__ustring
>> [  7] 0x00000001000eb348 0x0000000000000ac4 0x000eb348 0x00000002
>> 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
>> __TEXT.__gcc_except_tab
>> [  8] 0x00000001000ebe0c 0x00000000000011f0 0x000ebe0c 0x00000002
>> 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
>> __TEXT.__unwind_info
>> [  9] 0x00000001000ed000 0x0000000000000040 0x000ed000 0x00000003
>> 0x00000000 0x00000000 0x00000006 0x00000047 0x00000000 0x00000000
>> __DATA.__got
>> [ 10] 0x00000001000ed040 0x0000000000000010 0x000ed040 0x00000003
>> 0x00000000 0x00000000 0x00000006 0x0000004f 0x00000000 0x00000000
>> __DATA.__nl_symbol_ptr
>> [ 11] 0x00000001000ed050 0x0000000000000238 0x000ed050 0x00000003
>> 0x00000000 0x00000000 0x00000007 0x00000051 0x00000000 0x00000000
>> __DATA.__la_symbol_ptr
>> [ 12] 0x00000001000ed288 0x0000000000000030 0x000ed288 0x00000003
>> 0x00000000 0x00000000 0x00000009 0x00000000 0x00000000 0x00000000
>> __DATA.__mod_init_func
>> [ 13] 0x00000001000ed2c0 0x0000000000001908 0x000ed2c0 0x00000004
>> 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
>> __DATA.__const
>> [ 14] 0x00000001000eebd0 0x0000000000002fd4 0x000eebd0 0x00000004
>> 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
>> __DATA.__data
>> [ 15] 0x00000001000f1bb0 0x00000000000000f8 0x00000000 0x00000004
>> 0x00000000 0x00000000 0x00000001 0x00000000 0x00000000 0x00000000
>> __DATA.__common
>> [ 16] 0x00000001000f1cb0 0x0000000000003860 0x00000000 0x00000004
>> 0x00000000 0x00000000 0x00000001 0x00000000 0x00000000 0x00000000
>> __DATA.__bss
>>
>> Not sure if this helps you see anything?
>>
>> >
>> > > - Why are we preloading everything with the code: ...
>> >
>> > Yes, you're right, that was for debugging and slipped past my cleanup.
>>
>> Ah, phew!
>>
>> >
>> > > - Your fix to ObjectFileMachO.cpp is not correct...
>> >
>> > The code in if (process) doesn't do anything if we don't have a
>> linkedit_section_sp. Maybe we need to duplicate that code in an else block
>> for linkedit_section_sp ...
>>
>> I am thinking we need to fix the MachO file producer in llvm/clang to
>> make a __LINKEDIT segment. The __LINKEDIT segment contains anything that
>> isn't in any other section that isn't needed for running. It is just a
>> bunch if linker bits like the symbol table, string table, compact unwind
>> info and more. So all bits in a mach-o file must be spoken for and must be
>> in a segment. The mach-o file starts with a bunch of load commands (as you
>> can see above in the swig dump).
>>
>> The LC_SYMTAB load command contains information about the symbol table
>> and it contains:
>>
>> 0x00000670: <0x0018> LC_SYMTAB               symoff = 0x000f30d8, nsyms =
>> 82, stroff = 0x000f3858, strsize = 944
>>
>> This tells us the symbol table offset in the file (offset from the start
>> of the mach header) and the size, and the string table offset + size. The
>> symbol table and string table should be in a __LINKEDIT segment.
>>
>> Note there is other load commands that point to data in the __LINKEDIT
>> segment:
>>
>> 0x00000640: <0x0030> LC_DYLD_INFO_ONLY       rebase_off = 0x000f2000,
>> rebase_size = 216, bind_off = 0x000f20d8, bind_size = 400, weak_bind_off =
>> 0x000f2268, weak_bind_size = 48, lazy_bind_off = 0x000f2298, lazy_bind_size
>> = 1120, export_off = 0x000f26f8, export_size = 32,
>> 0x00000688: <0x0050> LC_DYSYMTAB             ilocalsym      = 0         ,
>> nlocalsym     = 1
>>                                              iextdefsym     = 1         ,
>> nextdefsym    = 1
>>                                              iundefsym      = 2         ,
>> nundefsym     = 80
>>                                              tocoff         = 0x00000000,
>> ntoc          = 0
>>                                              modtaboff      = 0x00000000,
>> nmodtab       = 0
>>                                              extrefsymoff   = 0x00000000,
>> nextrefsyms   = 0
>>                                              indirectsymoff = 0x000f35f8,
>> nindirectsyms = 152
>>                                              extreloff      = 0x00000000,
>> nextrel       = 0
>>                                              locreloff      = 0x00000000,
>> nlocrel       = 0
>>
>> 0x000007b0: <0x0010> LC_FUNCTION_STARTS      dataoff = 0x000f2718,
>> datasize = 2464
>> 0x000007e0: <0x0010> LC_CODE_SIGNATURE       dataoff = 0x000f3c10,
>> datasize = 10096
>>
>> > Thank you for your comments. I'm learning as I'm going here.
>>
>> No worries I can definitely help out with getting this ready, a few more
>> iterations and we should be good.
>> >
>> > Keno
>> >
>> >
>> >
>> > On Thu, Jun 5, 2014 at 12:37 PM, Greg Clayton <gclayton at apple.com>
>> wrote:
>> > Can you explain a few things?:
>> >
>> > - What is updateSectionLoadAddress(...) doing when it checks "if
>> (section_sp->GetFileAddress() > 0x100000)"?
>> > - Why are we preloading everything with the code:
>> >
>> >                  // load the symbol table right away
>> >                 module_sp->GetObjectFile()->GetSymtab();
>> >
>> >                 module_sp->GetSymbolVendor()->GetNumCompileUnits();
>> >                 module_sp->GetSymbolVendor()->GetCompileUnitAtIndex(0);
>> >                 module_sp->ParseAllDebugSymbols();
>> >
>> > This seems like we should just let it load things lazily. Parsing all
>> debug symbols is not advised, it should be allowed to lazily parse the
>> DWARF as it needs to.
>> >
>> > - Your fix to ObjectFileMachO.cpp is not correct. If we have a process,
>> then we load the symbol table from memory (the code in the "if (process)"),
>> else we load it from the load commands (in the "else") and from the file
>> itself. We don't want to always load the symbol table from the load
>> commands as the symtab_load_command.symoff and symtab_load_command.stroff
>> are not correct when a mach-o file is being read from memory.
>> >
>> >
>> >
>> > > On Jun 3, 2014, at 9:46 AM, Keno Fischer <
>> kfischer at college.harvard.edu> wrote:
>> > >
>> > > This is the LLDB side of http://reviews.llvm.org/D4005
>> > >
>> > > http://reviews.llvm.org/D4006
>> > >
>> > > Files:
>> > >  lib/Makefile
>> > >  source/Core/Section.cpp
>> > >  source/Plugins/JITLoader/GDB/JITLoaderGDB.cpp
>> > >  source/Plugins/Makefile
>> > >  source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp
>> > > <D4006.10055.patch>_______________________________________________
>> > > lldb-commits mailing list
>> > > lldb-commits at cs.uiuc.edu
>> > > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits
>> >
>> >
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-commits/attachments/20140605/ede3f54b/attachment.html>


More information about the lldb-commits mailing list