[lldb-dev] Redefining functions
Filipe Cabecinhas
filcab+lldb-dev at gmail.com
Fri Aug 12 00:26:19 PDT 2011
Hi,
On Thu, Aug 11, 2011 at 18:57, Greg Clayton <gclayton at apple.com> wrote:
>
> On Aug 11, 2011, at 6:11 PM, Filipe Cabecinhas wrote:
>
> > Hi,
> >
> > I've been toying around with loading libraries and what I can do with
> lldb, but it seems some of the support isn't there:
> >
> > - I can load a library from a command, but the only thing I get is a
> "token" (the return of dlopen());
> > - I can't (as far as I can tell) know what is the address for the GOT
> entry for a function (the one that will be changed by the dynamic linker on
> first invocation, they seem to be in the __DATA,__la_symbol_ptr section),
> but…
>
> On Mach-o you can see at least the stubs (locations that contain the lazy
> pointer indirections) as they are marked as "Trampoline" symbols:
>
>
> (lldb) target modules dump symtab a.out
> Symtab, file = /Volumes/work/gclayton/Documents/src/attach/a.out,
> num_symbols = 18:
> Debug symbol
> |Synthetic symbol
> ||Externally Visible
> |||
> Index UserID DSX Type File Address/Value Load Address Size
> Flags Name
> ------- ------ --- ------------ ------------------ ------------------
> ------------------ ---------- ----------------------------------
> [ 0] 0 D SourceFile 0x0000000000000000
> Sibling -> [ 4] 0x00640000
> /Volumes/work/gclayton/Documents/src/attach/test.c
> [ 1] 2 D ObjectFile 0x000000004e440e1e
> 0x0000000000000000 0x00660001
> /Volumes/work/gclayton/Documents/src/attach/test.o
> [ 2] 4 D Code 0x0000000100000d80
> 0x0000000000000070 0x000f0000 sleep_loop
> [ 3] 8 D Code 0x0000000100000df0
> 0x0000000000000066 0x000f0000 main
> [ 4] 12 Data 0x0000000100001000
> 0x0000000000000000 0x000e0000 pvars
> [ 5] 13 X Data 0x0000000100001068
> 0x0000000000000000 0x000f0000 NXArgc
> [ 6] 14 X Data 0x0000000100001070
> 0x0000000000000000 0x000f0000 NXArgv
> [ 7] 15 X Data 0x0000000100001080
> 0x0000000000000000 0x000f0000 __progname
> [ 8] 16 X Absolute 0x0000000100000000
> 0x0000000000000000 0x00030010 _mh_execute_header
> [ 9] 17 X Data 0x0000000100001078
> 0x0000000000000000 0x000f0000 environ
> [ 10] 20 X Code 0x0000000100000d40
> 0x0000000000000000 0x000f0000 start
> [ 11] 21 Trampoline 0x0000000100000e56
> 0x0000000000000006 0x00010100 exit
> [ 12] 22 Trampoline 0x0000000100000e5c
> 0x0000000000000006 0x00010100 getchar
> [ 13] 23 Trampoline 0x0000000100000e62
> 0x0000000000000006 0x00010100 getpid
> [ 14] 24 Trampoline 0x0000000100000e68
> 0x0000000000000006 0x00010100 printf
> [ 15] 25 Trampoline 0x0000000100000e6e
> 0x0000000000000006 0x00010100 puts
> [ 16] 26 Trampoline 0x0000000100000e74
> 0x0000000000000006 0x00010100 sleep
> [ 17] 27 X Extern 0x0000000000000000
> 0x0000000000000000 0x00010100 dyld_stub_binder
>
>
> The symbols 11 - 16 above are the stub entries for the where all calls to
> "exit", "getchar", etc are.
>
I saw those, but the only address they give me is the destination of the
trampoline, not the trampoline itself. I'm going to double-check tomorrow
(it's a huge code-base :-) ), but I don't think I can know the offset into
the GOT from there.
> > - Substituting the address in the GOT wouldn't work. I'll have to turn
> the original function into a jump to the new one. Nothing is in place for
> that;
>
> You will need to manually write memory for now, but it should be do-able.
> You could add some new functions to the ABI plug-ins:
>
> You could add an ABI function to the main ABI.h:
>
> #include "lldb/Target/ABI.h"
>
> virtual bool
> ABI::UpdateGOT (const char *func_name, ModuleList *modules, addr_t
> new_func_addr)
> {
> return false;
> }
>
> Then modify the x86_64 stuff to do the right thing
>
> lldb/source/Plugins/ABI/SysV-x86_64/ABISysV_x86_64.h
> lldb/source/Plugins/ABI/SysV-x86_64/ABISysV_x86_64.cpp
>
> If you don't end up overwriting the original function, the "modules"
> parameters could be nice as you might be able to take over say "print" but
> only for "a.out" and not other shared libraries. So if "modules" is NULL,
> then apply the new function to all modules, else, only try and apply it to
> the modules in the list. Just an idea...
>
That's a nice idea. I suppose we would also need to "communicate" with the
linker, so we could do the same when new modules get loaded.
> - I found one email from Jason Molenda where he explained how they
> implemented F&C on gdb (http://www.cygwin.com/ml/gdb/2003-06/msg00531.html), and am trying to do something similar. But it seems that the current dyld
> implementation doesn't have a flag to not run global constructors (or
> re-register ObjC classes), and NSLinkModule was deprecated, so these cases
> would not.
> >
> > I wanted to continue this work, but I have some doubts…
>
> There are plenty of issues with all ways of doing things, yes...
>
> > How could I get a handle (on my CommandObject) to the library loaded with
> dlopen? (It can have the same file name as an already loaded library, how
> can I tell which is which?)
> > If it is impossible, any ideas on how to add that feature?
>
> Why do you need the handle?
>
The handle is the only thing I can get from the process->LoadModule()
method. My main concern is: If I reload a dylib (from a file with the same
name), how can I know which module it is, from the ModuleList? Is it the
one with the highest index? Will the "old" Module simply be replaced, and I
can just search for filename?
> > After that, the easy way to replace the functions would be to get the
> symbols (at least for functions) that are defined in the recently loaded
> image and turn the current functions into jumps to the new functions.
>
> That is a good way if you don't want to call the original function. I have
> always wanted to "listen" to the malloc/free calls by making my own versions
> of malloc/free and do a little data gathering and yet still call through to
> the original functions.
>
> Hope some of the above hints help.
>
> Greg
>
That was one of the use-cases (instrumenting functions). :-)
Thanks for the reply,
Filipe
>
> >
> > Regards,
> >
> > Filipe
> >
> > On Mon, Aug 8, 2011 at 17:08, Filipe Cabecinhas <
> filcab+lldb-dev at gmail.com> wrote:
> > Hi!
> >
> > On Mon, Jul 18, 2011 at 18:13, Greg Clayton <gclayton at apple.com> wrote:
> >
> > On Jul 18, 2011, at 1:32 PM, Filipe Cabecinhas wrote:
> >
> > > Hi,
> > >
> > > I'm trying to create an LLDB command that sets an internal breakpoint
> for a function, and then executes some code, but I'm having come
> difficulties...
> > >
> > > I've seen the expression command, which does something close to what I
> want to do after the breakpoint, but I have some doubts. I want the code to
> be able to return from the function where it's called, but the
> "target->EvaluateExpression" doesn't let the code return from it (while I
> would like to execute code with something like "if (condition) return NULL;
> more code…"). Is there a way to compile arbitrary code (with return
> statements) and execute it?
> >
> > Not currently.
> >
> > >
> > > Is there a way to create something like an anonymous function (with
> certain parameters), and have it compiled and linked, while looking up
> global variables?
> >
> > Current expressions can do the lookups, but as you already know they
> don't live beyong the first invocation.
> >
> > > ClangUtilityFunction doesn't look up any variables, and I can't seem to
> find a way to look up global variables without a Frame object.
> >
> > For globals you shouldn't need the frame. If the globals are in your
> symbol table and are external you might be able to use dlsym().
> >
> > > Is there a way to know a function (or method)'s address from its
> prototype?
> >
> > A normal fuction that was compiled into your code or an expression
> function?
> >
> > For my first try (a command like "expr" but that would re-define
> functions) I wantes to find out the location of some function/method, given
> the prototype (e.g: "ProcessGDBRemote::StartDebugserverProcess(char
> const*)"). I would suppose we could mangle the name and try to find the
> symbol. I haven't seen any way to do that in lldb, but I suppose it's
> possible to do. Maybe I'm looking at it wrong.
> >
> >
> > > My final purpose is to be able to redefine functions on-the-fly (with
> caveats for inlined functions, etc). The only way I saw that could work was
> creating a (similar) function and making the other function a trampoline
> (either using breakpoints, or writing a jmp expression at its address)… Did
> I miss another easier way?
> >
> > We do want the ability to just compile up something in an LLDB command
> but we don't have that yet. You currently can do this via python if you
> really want to by making a source file, invoking the compiler on it, and
> then making a dylib. You can then use the "process load" command to load the
> shared library:
> >
> > (lldb) process load foo.so
> >
> > So if you have your python code do the global variable lookups and create
> the source code, you could hack something together.
> >
> > When/if you are ready to try and take over the function, you can look for
> any "Trampoline" symbols. For a simple a.out program on darwin we see:
> >
> > (lldb) file ~/Documents/src/args/a.out
> > Current executable set to '~/Documents/src/args/a.out' (i386).
> > (lldb) image dump symtab a.out
> > Symtab, file = /Volumes/work/gclayton/Documents/src/args/a.out,
> num_symbols = 18:
> > Debug symbol
> > |Synthetic symbol
> > ||Externally Visible
> > |||
> > Index UserID DSX Type File Address/Value Load Address
> Size Flags Name
> > ------- ------ --- ------------ ------------------ ------------------
> ------------------ ---------- ----------------------------------
> > ....
> > [ 10] 16 Trampoline 0x0000000000001e76
> 0x0000000000000006 0x00010100 __stack_chk_fail
> > ...
> > [ 12] 18 Trampoline 0x0000000000001e7c
> 0x0000000000000006 0x00010100 exit
> > [ 13] 19 Trampoline 0x0000000000001e82
> 0x0000000000000006 0x00010100 getcwd
> > [ 14] 20 Trampoline 0x0000000000001e88
> 0x0000000000000006 0x00010100 perror
> > [ 15] 21 Trampoline 0x0000000000001e8e
> 0x0000000000000006 0x00010100 printf
> > [ 16] 22 Trampoline 0x0000000000001e94
> 0x0000000000000006 0x00010100 puts
> >
> > On MacOSX, you could then easily patch the trampoline code to call your
> own function for say "printf" by modifying the function address in the PLT
> entry.
> >
> >
> > That would be a good solution, at least to substitute functions that are
> accessed with the PLT. But are the trampolines reified (I don't think so)?
> Or should I just write to the process' PLT directly, after loading the
> function?
> >
> > What about replacing other functions? Let's say that I want to replace a
> random function (that I can't replace by changing the PLT). If I have
> information about which functions call it, I can replace the definition of
> the function by a jump and, if necessary, get the new versions of the
> functions that call the replaced function (doing the same to them, for a
> maximum of X iterations, for example). Though I would suppose clang won't
> give us that information (at least for now).
> >
> > Thanks for the help,
> >
> > Filipe
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20110812/3978f6aa/attachment.html>
More information about the lldb-dev
mailing list