[lldb-dev] Redefining functions

Fri Aug 12 00:26:19 PDT 2011

Hi,

On Thu, Aug 11, 2011 at 18:57, Greg Clayton <gclayton at apple.com> wrote:

>
> On Aug 11, 2011, at 6:11 PM, Filipe Cabecinhas wrote:
>
> > Hi,
> >
> > I've been toying around with loading libraries and what I can do with
> lldb, but it seems some of the support isn't there:
> >
> >   - I can load a library from a command, but the only thing I get is a
> "token" (the return of dlopen());
> >   - I can't (as far as I can tell) know what is the address for the GOT
> entry for a function (the one that will be changed by the dynamic linker on
> first invocation, they seem to be in the __DATA,__la_symbol_ptr section),
> but…
>
> On Mach-o you can see at least the stubs (locations that contain the lazy
> pointer indirections) as they are marked as "Trampoline" symbols:
>
>
> (lldb) target modules dump symtab a.out
> Symtab, file = /Volumes/work/gclayton/Documents/src/attach/a.out,
> num_symbols = 18:
>                Debug symbol
>               |Synthetic symbol
>               ||Externally Visible
>               |||
> Index   UserID DSX Type         File Address/Value Load Address       Size
>               Flags      Name
> ------- ------ --- ------------ ------------------ ------------------
> ------------------ ---------- ----------------------------------
> [    0]      0 D   SourceFile   0x0000000000000000
>  Sibling -> [    4] 0x00640000
> /Volumes/work/gclayton/Documents/src/attach/test.c
> [    1]      2 D   ObjectFile   0x000000004e440e1e
>  0x0000000000000000 0x00660001
> /Volumes/work/gclayton/Documents/src/attach/test.o
> [    2]      4 D   Code         0x0000000100000d80
>  0x0000000000000070 0x000f0000 sleep_loop
> [    3]      8 D   Code         0x0000000100000df0
>  0x0000000000000066 0x000f0000 main
> [    4]     12     Data         0x0000000100001000
>  0x0000000000000000 0x000e0000 pvars
> [    5]     13   X Data         0x0000000100001068
>  0x0000000000000000 0x000f0000 NXArgc
> [    6]     14   X Data         0x0000000100001070
>  0x0000000000000000 0x000f0000 NXArgv
> [    7]     15   X Data         0x0000000100001080
>  0x0000000000000000 0x000f0000 __progname
> [    8]     16   X Absolute     0x0000000100000000
>  0x0000000000000000 0x00030010 _mh_execute_header
> [    9]     17   X Data         0x0000000100001078
>  0x0000000000000000 0x000f0000 environ
> [   10]     20   X Code         0x0000000100000d40
>  0x0000000000000000 0x000f0000 start
> [   11]     21     Trampoline   0x0000000100000e56
>  0x0000000000000006 0x00010100 exit
> [   12]     22     Trampoline   0x0000000100000e5c
>  0x0000000000000006 0x00010100 getchar
> [   13]     23     Trampoline   0x0000000100000e62
>  0x0000000000000006 0x00010100 getpid
> [   14]     24     Trampoline   0x0000000100000e68
>  0x0000000000000006 0x00010100 printf
> [   15]     25     Trampoline   0x0000000100000e6e
>  0x0000000000000006 0x00010100 puts
> [   16]     26     Trampoline   0x0000000100000e74
>  0x0000000000000006 0x00010100 sleep
> [   17]     27   X Extern       0x0000000000000000
>  0x0000000000000000 0x00010100 dyld_stub_binder
>
>
> The symbols 11 - 16 above are the stub entries for the where all calls to
> "exit", "getchar", etc are.
>

I saw those, but the only address they give me is the destination of the
trampoline, not the trampoline itself. I'm going to double-check tomorrow
(it's a huge code-base :-) ), but I don't think I can know the offset into
the GOT from there.

> >   - Substituting the address in the GOT wouldn't work. I'll have to turn
> the original function into a jump to the new one. Nothing is in place for
> that;
>
> You will need to manually write memory for now, but it should be do-able.
> You could add some new functions to the ABI plug-ins:
>
> You could add an ABI function to the main ABI.h:
>
> #include "lldb/Target/ABI.h"
>
>        virtual bool
>        ABI::UpdateGOT (const char *func_name, ModuleList *modules, addr_t
> new_func_addr)
>        {
>                return false;
>        }
>
> Then modify the x86_64 stuff to do the right thing
>
> lldb/source/Plugins/ABI/SysV-x86_64/ABISysV_x86_64.h
> lldb/source/Plugins/ABI/SysV-x86_64/ABISysV_x86_64.cpp
>
> If you don't end up overwriting the original function, the "modules"
> parameters could be nice as you might be able to take over say "print" but
> only for "a.out" and not other shared libraries. So if "modules" is NULL,
> then apply the new function to all modules, else, only try and apply it to
> the modules in the list. Just an idea...
>

That's a nice idea. I suppose we would also need to "communicate" with the
linker, so we could do the same when new modules get loaded.

>   - I found one email from Jason Molenda where he explained how they
> implemented F&C on gdb (http://www.cygwin.com/ml/gdb/2003-06/msg00531.html), and am trying to do something similar. But it seems that the current dyld
> implementation doesn't have a flag to not run global constructors (or
> re-register ObjC classes), and NSLinkModule was deprecated, so these cases
> would not.
> >
> > I wanted to continue this work, but I have some doubts…
>
> There are plenty of issues with all ways of doing things, yes...
>
> > How could I get a handle (on my CommandObject) to the library loaded with
> dlopen? (It can have the same file name as an already loaded library, how
> can I tell which is which?)
> > If it is impossible, any ideas on how to add that feature?
>
> Why do you need the handle?
>

The handle is the only thing I can get from the process->LoadModule()
method. My main concern is: If I reload a dylib (from a file with the same
name), how can I know  which module it is, from the ModuleList? Is it the
one with the highest index? Will the "old" Module simply be replaced, and I
can just search for filename?

> > After that, the easy way to replace the functions would be to get the
> symbols (at least for functions) that are defined in the recently loaded
> image and turn the current functions into jumps to the new functions.
>
> That is a good way if you don't want to call the original function. I have
> always wanted to "listen" to the malloc/free calls by making my own versions
> of malloc/free and do a little data gathering and yet still call through to
> the original functions.
>
> Hope some of the above hints help.
>
> Greg
>

That was one of the use-cases (instrumenting functions). :-)

Thanks for the reply,

  Filipe

>
> >
> > Regards,
> >
> >   Filipe
> >
> > On Mon, Aug 8, 2011 at 17:08, Filipe Cabecinhas <
> filcab+lldb-dev at gmail.com> wrote:
> > Hi!
> >
> > On Mon, Jul 18, 2011 at 18:13, Greg Clayton <gclayton at apple.com> wrote:
> >
> > On Jul 18, 2011, at 1:32 PM, Filipe Cabecinhas wrote:
> >
> > > Hi,
> > >
> > > I'm trying to create an LLDB command that sets an internal breakpoint
> for a function, and then executes some code, but I'm having come
> difficulties...
> > >
> > > I've seen the expression command, which does something close to what I
> want to do after the breakpoint, but I have some doubts. I want the code to
> be able to return from the function where it's called, but the
> "target->EvaluateExpression" doesn't let the code return from it (while I
> would like to execute code with something like "if (condition) return NULL;
> more code…"). Is there a way to compile arbitrary code (with return
> statements) and execute it?
> >
> > Not currently.
> >
> > >
> > > Is there a way to create something like an anonymous function (with
> certain parameters), and have it compiled and linked, while looking up
> global variables?
> >
> > Current expressions can do the lookups, but as you already know they
> don't live beyong the first invocation.
> >
> > > ClangUtilityFunction doesn't look up any variables, and I can't seem to
> find a way to look up global variables without a Frame object.
> >
> > For globals you shouldn't need the frame. If the globals are in your
> symbol table and are external you might be able to use dlsym().
> >
> > > Is there a way to know a function (or method)'s address from its
> prototype?
> >
> > A normal fuction that was compiled into your code or an expression
> function?
> >
> > For my first try (a command like "expr" but that would re-define
> functions) I wantes to find out the location of some function/method, given
> the prototype (e.g: "ProcessGDBRemote::StartDebugserverProcess(char
> const*)"). I would suppose we could mangle the name and try to find the
> symbol. I haven't seen any way to do that in lldb, but I suppose it's
> possible to do. Maybe I'm looking at it wrong.
> >
> >
> > > My final purpose is to be able to redefine functions on-the-fly (with
> caveats for inlined functions, etc). The only way I saw that could work was
> creating a (similar) function and making the other function a trampoline
> (either using breakpoints, or writing a jmp expression at its address)… Did
> I miss another easier way?
> >
> > We do want the ability to just compile up something in an LLDB command
> but we don't have that yet. You currently can do this via python if you
> really want to by making a source file, invoking the compiler on it, and
> then making a dylib. You can then use the "process load" command to load the
> shared library:
> >
> > (lldb) process load foo.so
> >
> > So if you have your python code do the global variable lookups and create
> the source code, you could hack something together.
> >
> > When/if you are ready to try and take over the function, you can look for
> any "Trampoline" symbols. For a simple a.out program on darwin we see:
> >
> > (lldb) file ~/Documents/src/args/a.out
> > Current executable set to '~/Documents/src/args/a.out' (i386).
> > (lldb) image dump symtab a.out
> > Symtab, file = /Volumes/work/gclayton/Documents/src/args/a.out,
> num_symbols = 18:
> >               Debug symbol
> >               |Synthetic symbol
> >               ||Externally Visible
> >               |||
> > Index   UserID DSX Type         File Address/Value Load Address
> Size               Flags      Name
> > ------- ------ --- ------------ ------------------ ------------------
> ------------------ ---------- ----------------------------------
> > ....
> > [   10]     16     Trampoline   0x0000000000001e76
>  0x0000000000000006 0x00010100 __stack_chk_fail
> > ...
> > [   12]     18     Trampoline   0x0000000000001e7c
>  0x0000000000000006 0x00010100 exit
> > [   13]     19     Trampoline   0x0000000000001e82
>  0x0000000000000006 0x00010100 getcwd
> > [   14]     20     Trampoline   0x0000000000001e88
>  0x0000000000000006 0x00010100 perror
> > [   15]     21     Trampoline   0x0000000000001e8e
>  0x0000000000000006 0x00010100 printf
> > [   16]     22     Trampoline   0x0000000000001e94
>  0x0000000000000006 0x00010100 puts
> >
> > On MacOSX, you could then easily patch the trampoline code to call your
> own function for say "printf" by modifying the function address in the PLT
> entry.
> >
> >
> > That would be a good solution, at least to substitute functions that are
> accessed with the PLT. But are the trampolines reified (I don't think so)?
> Or should I just write to the process' PLT directly, after loading the
> function?
> >
> > What about replacing other functions? Let's say that I want to replace a
> random function (that I can't replace by changing the PLT). If I have
> information about which functions call it, I can replace the definition of
> the function by a jump and, if necessary, get the new versions of the
> functions that call the replaced function (doing the same to them, for a
> maximum of X iterations, for example). Though I would suppose clang won't
> give us that information (at least for now).
> >
> > Thanks for the help,
> >
> >   Filipe
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20110812/3978f6aa/attachment.html>