[lldb-dev] Redefining functions

Thu Aug 11 18:57:33 PDT 2011

On Aug 11, 2011, at 6:11 PM, Filipe Cabecinhas wrote:

> Hi,
> 
> I've been toying around with loading libraries and what I can do with lldb, but it seems some of the support isn't there:
> 
>   - I can load a library from a command, but the only thing I get is a "token" (the return of dlopen());
>   - I can't (as far as I can tell) know what is the address for the GOT entry for a function (the one that will be changed by the dynamic linker on first invocation, they seem to be in the __DATA,__la_symbol_ptr section), but…

On Mach-o you can see at least the stubs (locations that contain the lazy pointer indirections) as they are marked as "Trampoline" symbols:

(lldb) target modules dump symtab a.out 
Symtab, file = /Volumes/work/gclayton/Documents/src/attach/a.out, num_symbols = 18:
               Debug symbol
               |Synthetic symbol
               ||Externally Visible
               |||
Index   UserID DSX Type         File Address/Value Load Address       Size               Flags      Name
------- ------ --- ------------ ------------------ ------------------ ------------------ ---------- ----------------------------------
[    0]      0 D   SourceFile   0x0000000000000000                    Sibling -> [    4] 0x00640000 /Volumes/work/gclayton/Documents/src/attach/test.c
[    1]      2 D   ObjectFile   0x000000004e440e1e                    0x0000000000000000 0x00660001 /Volumes/work/gclayton/Documents/src/attach/test.o
[    2]      4 D   Code         0x0000000100000d80                    0x0000000000000070 0x000f0000 sleep_loop
[    3]      8 D   Code         0x0000000100000df0                    0x0000000000000066 0x000f0000 main
[    4]     12     Data         0x0000000100001000                    0x0000000000000000 0x000e0000 pvars
[    5]     13   X Data         0x0000000100001068                    0x0000000000000000 0x000f0000 NXArgc
[    6]     14   X Data         0x0000000100001070                    0x0000000000000000 0x000f0000 NXArgv
[    7]     15   X Data         0x0000000100001080                    0x0000000000000000 0x000f0000 __progname
[    8]     16   X Absolute     0x0000000100000000                    0x0000000000000000 0x00030010 _mh_execute_header
[    9]     17   X Data         0x0000000100001078                    0x0000000000000000 0x000f0000 environ
[   10]     20   X Code         0x0000000100000d40                    0x0000000000000000 0x000f0000 start
[   11]     21     Trampoline   0x0000000100000e56                    0x0000000000000006 0x00010100 exit
[   12]     22     Trampoline   0x0000000100000e5c                    0x0000000000000006 0x00010100 getchar
[   13]     23     Trampoline   0x0000000100000e62                    0x0000000000000006 0x00010100 getpid
[   14]     24     Trampoline   0x0000000100000e68                    0x0000000000000006 0x00010100 printf
[   15]     25     Trampoline   0x0000000100000e6e                    0x0000000000000006 0x00010100 puts
[   16]     26     Trampoline   0x0000000100000e74                    0x0000000000000006 0x00010100 sleep
[   17]     27   X Extern       0x0000000000000000                    0x0000000000000000 0x00010100 dyld_stub_binder

The symbols 11 - 16 above are the stub entries for the where all calls to "exit", "getchar", etc are.

>   - Substituting the address in the GOT wouldn't work. I'll have to turn the original function into a jump to the new one. Nothing is in place for that;

You will need to manually write memory for now, but it should be do-able. You could add some new functions to the ABI plug-ins:

You could add an ABI function to the main ABI.h:

#include "lldb/Target/ABI.h"

	virtual bool
	ABI::UpdateGOT (const char *func_name, ModuleList *modules, addr_t new_func_addr)
	{
		return false;
	}

Then modify the x86_64 stuff to do the right thing

lldb/source/Plugins/ABI/SysV-x86_64/ABISysV_x86_64.h
lldb/source/Plugins/ABI/SysV-x86_64/ABISysV_x86_64.cpp

If you don't end up overwriting the original function, the "modules" parameters could be nice as you might be able to take over say "print" but only for "a.out" and not other shared libraries. So if "modules" is NULL, then apply the new function to all modules, else, only try and apply it to the modules in the list. Just an idea...

>   - I found one email from Jason Molenda where he explained how they implemented F&C on gdb (http://www.cygwin.com/ml/gdb/2003-06/msg00531.html ), and am trying to do something similar. But it seems that the current dyld implementation doesn't have a flag to not run global constructors (or re-register ObjC classes), and NSLinkModule was deprecated, so these cases would not.
> 
> I wanted to continue this work, but I have some doubts…

There are plenty of issues with all ways of doing things, yes...

> How could I get a handle (on my CommandObject) to the library loaded with dlopen? (It can have the same file name as an already loaded library, how can I tell which is which?)
> If it is impossible, any ideas on how to add that feature?

Why do you need the handle?

> After that, the easy way to replace the functions would be to get the symbols (at least for functions) that are defined in the recently loaded image and turn the current functions into jumps to the new functions.

That is a good way if you don't want to call the original function. I have always wanted to "listen" to the malloc/free calls by making my own versions of malloc/free and do a little data gathering and yet still call through to the original functions.

Hope some of the above hints help.

Greg

> 
> Regards,
> 
>   Filipe
> 
> On Mon, Aug 8, 2011 at 17:08, Filipe Cabecinhas <filcab+lldb-dev at gmail.com> wrote:
> Hi!
> 
> On Mon, Jul 18, 2011 at 18:13, Greg Clayton <gclayton at apple.com> wrote:
> 
> On Jul 18, 2011, at 1:32 PM, Filipe Cabecinhas wrote:
> 
> > Hi,
> >
> > I'm trying to create an LLDB command that sets an internal breakpoint for a function, and then executes some code, but I'm having come difficulties...
> >
> > I've seen the expression command, which does something close to what I want to do after the breakpoint, but I have some doubts. I want the code to be able to return from the function where it's called, but the "target->EvaluateExpression" doesn't let the code return from it (while I would like to execute code with something like "if (condition) return NULL; more code…"). Is there a way to compile arbitrary code (with return statements) and execute it?
> 
> Not currently.
> 
> >
> > Is there a way to create something like an anonymous function (with certain parameters), and have it compiled and linked, while looking up global variables?
> 
> Current expressions can do the lookups, but as you already know they don't live beyong the first invocation.
> 
> > ClangUtilityFunction doesn't look up any variables, and I can't seem to find a way to look up global variables without a Frame object.
> 
> For globals you shouldn't need the frame. If the globals are in your symbol table and are external you might be able to use dlsym().
> 
> > Is there a way to know a function (or method)'s address from its prototype?
> 
> A normal fuction that was compiled into your code or an expression function?
> 
> For my first try (a command like "expr" but that would re-define functions) I wantes to find out the location of some function/method, given the prototype (e.g: "ProcessGDBRemote::StartDebugserverProcess(char const*)"). I would suppose we could mangle the name and try to find the symbol. I haven't seen any way to do that in lldb, but I suppose it's possible to do. Maybe I'm looking at it wrong.
> 
>  
> > My final purpose is to be able to redefine functions on-the-fly (with caveats for inlined functions, etc). The only way I saw that could work was creating a (similar) function and making the other function a trampoline (either using breakpoints, or writing a jmp expression at its address)… Did I miss another easier way?
> 
> We do want the ability to just compile up something in an LLDB command but we don't have that yet. You currently can do this via python if you really want to by making a source file, invoking the compiler on it, and then making a dylib. You can then use the "process load" command to load the shared library:
> 
> (lldb) process load foo.so
> 
> So if you have your python code do the global variable lookups and create the source code, you could hack something together.
> 
> When/if you are ready to try and take over the function, you can look for any "Trampoline" symbols. For a simple a.out program on darwin we see:
> 
> (lldb) file ~/Documents/src/args/a.out
> Current executable set to '~/Documents/src/args/a.out' (i386).
> (lldb) image dump symtab a.out
> Symtab, file = /Volumes/work/gclayton/Documents/src/args/a.out, num_symbols = 18:
>               Debug symbol
>               |Synthetic symbol
>               ||Externally Visible
>               |||
> Index   UserID DSX Type         File Address/Value Load Address       Size               Flags      Name
> ------- ------ --- ------------ ------------------ ------------------ ------------------ ---------- ----------------------------------
> ....
> [   10]     16     Trampoline   0x0000000000001e76                    0x0000000000000006 0x00010100 __stack_chk_fail
> ...
> [   12]     18     Trampoline   0x0000000000001e7c                    0x0000000000000006 0x00010100 exit
> [   13]     19     Trampoline   0x0000000000001e82                    0x0000000000000006 0x00010100 getcwd
> [   14]     20     Trampoline   0x0000000000001e88                    0x0000000000000006 0x00010100 perror
> [   15]     21     Trampoline   0x0000000000001e8e                    0x0000000000000006 0x00010100 printf
> [   16]     22     Trampoline   0x0000000000001e94                    0x0000000000000006 0x00010100 puts
> 
> On MacOSX, you could then easily patch the trampoline code to call your own function for say "printf" by modifying the function address in the PLT entry.
> 
> 
> That would be a good solution, at least to substitute functions that are accessed with the PLT. But are the trampolines reified (I don't think so)? Or should I just write to the process' PLT directly, after loading the function?
> 
> What about replacing other functions? Let's say that I want to replace a random function (that I can't replace by changing the PLT). If I have information about which functions call it, I can replace the definition of the function by a jump and, if necessary, get the new versions of the functions that call the replaced function (doing the same to them, for a maximum of X iterations, for example). Though I would suppose clang won't give us that information (at least for now).
> 
> Thanks for the help,
> 
>   Filipe
>