[llvm-dev] Issues when a native function calls a DFSan-instrumented function
Brian Johannesmeyer via llvm-dev
llvm-dev at lists.llvm.org
Fri Apr 26 07:47:56 PDT 2019
Hello,
I am trying to run a program with DFSan where I do not control the
compilation of all of its source files. Meaning, I can only instrument a
subset of the source files to have DFSan support. Thus, during linking,
some of the object files have DFSan support (i.e., use the “instrumented”
ABI) and some object files don’t (i.e., use the “native” ABI). I am hoping
that although my entire program isn't instrumented, I can still at least
use DFSan to analyze the instrumented part.
However, my program makes function calls across the native/instrumented
boundary --- and although this is fine for calls from an instrumented
function to a native function (since I can simply add the native function
to DFSan's ABI list) --- it is creating issues for calls from a native
function to an instrumented function. Specifically, when linking, I get
undefined reference errors because the native function is attempting to
call the instrumented function’s original name --- i.e., without the `dfs$`
prefix. E.g., the native function is trying to call `foo`, but `foo` has
been renamed `dfs$foo`. I can workaround these linking errors by adding
`foo` to the ABI list, but then calls to `foo` from an instrumented
function don't automatically propagate taint into/out of `foo` like it
should (it uses the ABI list's rules).
I can demonstrate what I mean with an example. Suppose my program has two
object files: `instrumented.o` (which is instrumented by DFSan) and
`native.o` (which is not instrumented by DFSan). `main` (from
`instrumented.o`) calls `add3` (from `native.o`) to compute the sum of
three numbers, (x+y+z). To compute this sum, `add3` makes two calls to
`add2` (from `instrumented.o`), to compute ((x+y)+z). `main` then performs
two tests: (i) It checks whether a call to `add2` maintains accurate label
information, and (ii) It checks whether a call to `add3` maintains accurate
label information (according to the ABI list's rule). The source files are
below.
Additionally, I have the following ABI list:
```
fun:add3=uninstrumented
fun:add3=functional
```
Together, this gives the following linker error:
```
native.o: In function `add3':
native.c:(.text+0x18): undefined reference to `add2'
```
This is because `add3` is attempting to call `add2`, but `add2` has been
replaced by `dfs$add2`.
I can work around this linker error by adding `add2` to the ABI list:
```
fun:add3=uninstrumented
fun:add3=functional
fun:add2=uninstrumented
fun:add2=discard
```
As a result, this successfully links, however it removes DFSan support of
taint into/out of `add2`. Running the program gives the following output:
```
INST-->INST label test...
x label (1) == x_test label (0)? FALSE
INST-->NATIVE label test...
sum label has x label? TRUE
```
In the output above, `x_test` is the result of `add2(x,0)`, so it should
have the same label as `x`; however, because `add2` is in the ABI list as
`discard`, its return value is unlabelled, so `x`'s label does not match
`x_test`'s label. `add3` preserves taint correctly because it is listed as
`functional` in the ABI list.
Is there a way to maintain accurate label information for
instrumented-->instrumented function calls but also permit
native-->instrumented function calls to the same callee? Maybe I'm missing
something obvious, but I only see the following workarounds here:
1) Add each instrumented function to the ABI list correctly. In my example,
this would mean setting `add2` as a `functional` or `custom` function.
However this does not scale well for large applications, and defeats the
purpose of DFSan's automatic taint propagation.
2) Go through the instrumented object files and replace, e.g., `dfs$foo`
with `foo`. However, this would probably produce some sort of undefined
behavior, as mentioned in the DFSan design document.
This seems like it would be a common use case for DFSan --- where there are
circular dependencies between native and instrumented compilation units. I
would appreciate any feedback.
Thanks,
Brian
--------
Here are the source files:
1. `instrumented.o`, which was instrumented by DFSan, comes from
`instrumented.c`:
```
#include <stdio.h>
#include <sanitizer/dfsan_interface.h>
#define LBL dfsan_get_label
#define S_BOOL(x) x ? "TRUE" : "FALSE"
int add2(int a, int b) {
return a + b;
}
int add3(int a, int b, int c);
int main(void){
int x = 2, y = 3, z = 4;
dfsan_set_label(dfsan_create_label("x", 0), &x, sizeof(x));
int x_test = add2(x,0);
int sum = add3(x,y,z);
printf("INST-->INST label test...\n");
printf("x label (%d) == x_test label (%d)? %s\n\n", LBL(x),
LBL(x_test), S_BOOL((LBL(x) == LBL(x_test))));
printf("INST-->NATIVE label test...\n");
printf("sum label has x label? %s\n", S_BOOL(dfsan_has_label(LBL(sum),
LBL(x))));
return 0;
}
```
2. `native.o`, which was not instrumented by DFSan, comes from `native.c`:
```
int add2(int a, int b);
int add3(int a, int b, int c){
return add2(add2(a,b), c);
}
```
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190426/3e95cc81/attachment-0001.html>
More information about the llvm-dev
mailing list