[llvm-dev] "compiler-rt" - DataFlowSanitizer
Sam Kerner via llvm-dev
llvm-dev at lists.llvm.org
Thu Apr 18 15:32:24 PDT 2019
On Thu, Apr 18, 2019 at 10:01 AM dareen khalid <dareen.k.f at hotmail.com> wrote:
>
> I'm new to llvm passes. I wonder if I can use the pass to dynamically analyze a program.
As Bekket points out, the way to use DataFlowSanitizer is to modify
your program to call functions in sanitizer/dfsan_interface.h .
> Let me explain what I want to do in the following example .
>
>
>
> //"Result" is an object that doesn't have a fixed length like int and float.
DataflowSanitizer tracks the bytes of memory that store a variable.
"Result" may not have a fixed length at compile time, but at runtime
it presumably has a fixed set of bytes that hold its contents. If you
can find those bytes, you can read the label.
>
> //The program is as follows.
> Result retrieve_data_fun (){
> //the function retrieves some sensitive data from files/DB
>
> return result_info;
> }
>
> void main(){
> ...
> Result var1, var2, var3l
> var1 = retrieve_data_fun();
> ...
> Result var2 = retrieve_data_fun();
>
> printf (var1);
> var3 = var2+'xxx';
> print (var3);
>
> }
>
> My goal is to track all the returned data from "retrieve_data_fun" and monitor actions on them.
You would need to test for the label on each action.
> So, whenever the data is used (e.g., printed) , I want to detect that; maybe by printing a statement or anything.
> Could you please help me to do that ?
Here is a sketch of how this might be done:
#include <sanitizer/dfsan_interface.h>
static dfsan_label sensitive_data_label = dfsan_create_label("Data
returned by retrieve_data_fun()", NULL);
// The function retrieves some sensitive data from files or a DB.
// Users should call retrieve_data_fun() instead of this
// implementation, to ensure results are labeled.
Result retrieve_data_fun_impl () {
...
}
Result retrieve_data_fun() {
// All the real work is done in retrieve_data_fun_impl(). This
// function calls it, labels the result, and returns it.
// This style ensures that all return paths in
// retrieve_data_fun_impl() have the label applied to the result.
Result = retrieve_data_fun_impl();
dfsan_set_label<Result>(sensitive_data_label, Result);
return Result;
}
bool IsResultSensitive(const Result& result) {
// There are types for which sizeof(result) does not give the
// bytes needed. For example, if Result is a vector,
// you may need to iterate over the elements in the vector, and
// read the labels of the elements.
dfsan_label label = dfsan_read_label(&result, sizeof(result));
return (dfsan_has_label(label, sensitive_data_label);
}
bool isIntSensitive(int i) {
dfsan_label label = dfsan_read_label(&i, sizeof(i));
return (dfsan_has_label(label, sensitive_data_label);
}
void main() {
// 'sensitive_result' is sensitive. It was labeled as sensitive
// by 'retrieve_data_fun()'.
Result sensitive_result = retrieve_data_fun();
// 'not_a_sensitive_result' is built without reading any sensitive
// data. It is not labeled.
Result not_a_sensitive_result = Result();
// Should print "1".
std::cout << "Is 'sensitive_result' sensitive? "
<< IsResultSensitive(sensitive_result) << std::endl;
// Should print "0".
std::cout << "Is 'not_a_sensitive_result' sensitive? "
<< IsResultSensitive(not_a_sensitive_result) << std::endl;
int derived_from_result = sensitive_result.getIntFromResult(); //
Assume this reads bytes from the result.
// Should print "1".
std::cout << "Is 'derived_from_result' sensitive? " <<
IsIntSensitive(derived_from_result) << std::endl;
// This will fail. See the definition of TakeSomeActionOnResult().
TakeSomeActionOnResult(sensitive_result);
// This will print "REDACTED" because of the operator<< method below.
std::cout << "Sensitive result is "<< sensitive_result << std::endl;
}
bool TakeSomeActionOnResult(const Result& result) {
// Guard against taking this action on sensitive data.
if (IsResultSensitive(result)) {
std::cerr << "WARNING: Tried to call TakeSomeActionOnResult() on "
<< "a sensitive result. This is not allowed!";
return false;
}
...
return true;
}
// Override the << operator to not print sensitive data.
ostream& operator<<(ostream& os, const Result& result)
{
if (IsResultSensitive(result)) {
os << "REDACTED";
} else {
// This result is not sensitive, construct a string form.
...
}
return os;
}
>
> Thanks,
> Daren
>
>
> ________________________________
> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Sam Kerner via llvm-dev <llvm-dev at lists.llvm.org>
> Sent: Wednesday, April 17, 2019 5:56 PM
> To: llvm-dev at lists.llvm.org
> Subject: Re: [llvm-dev] "compiler-rt" - DataFlowSanitizer
>
> On Tue, Apr 16, 2019 at 3:44 PM dareen khalid via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> >
> > Hi all,
> >
> > I have some questions about "DataFlowSanitizer" from "compiler-rt".
> > I want to know how I can test the "DataFlowSanitizer"?
>
> This document is a good reference for DataFlowSanitizer:
> https://clang.llvm.org/docs/DataFlowSanitizer.html
>
> > Can I configure it to label only some values,
>
> The section named "Example" in the document above shows a simple
> program that sets and tests for labels.
>
> dfsan_create_label() creates a label.
>
> dfsan_set_label() applies a label to the memory holding a variable.
>
> > i.e, the return values from specific functions?
>
> To label the return value of a function, add a call to
> dfsan_set_label() on the return value of the function:
>
> // Outside the function:
> dfsan_label return_label = dfsan_create_label("return_label", 0);
>
> // An example function:
> int MyFunction(int a, int b) {
> ...
> int result = ...;
>
> // Set a label on the returned value:
> dfsan_set_label(return_label, &result, sizeof(result));
>
> return result;
> }
>
> > Also, how can I print these labels?
>
> To discover the label on a variable, you can test for it and print the result:
>
> int var = ...;
>
> // Does 'var' have label 'return_label'?
> dfsan_label var_label = dfsan_get_label(var);
> if (dfsan_has_label(var_label, return_label)) {
> printf("'var' has the label ''return_label");
> }
>
> To see the state of all labels at the time the program exits set, set
> the shell variable DFSAN_OPTIONS to "dump_labels_at_exit=<file path>".
> For example, suppose the example program in the document is in a file
> named "dfsan.c". Here are the commands I ran to see the state of all
> labels when it exits:
>
> # Compile dfsan.c into a binary named "dfsan":
> $ clang -g -fsanitize=dataflow dfsan.c -o dfsan
>
> # Run it. There is no output because all assertions pass:
> $ ./dfsan
>
> # Run it again with shell variable DFSAN_OPTIONS set to export label
> state to standard out on exit:
> $ env DFSAN_OPTIONS=dump_labels_at_exit=/dev/stdout ./dfsan
> ==21994==INFO: DataFlowSanitizer: dumping labels to /dev/stdout
> 1 0 0 i
> 2 0 0 j
> 3 0 0 k
> 4 1 2
> 5 3 4
>
> If you tell us more about what you are trying to accomplish with
> DataFlowSanitizer, we may be able to give more specific advice.
>
> >
> > Thanks,
> > Dareen
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
More information about the llvm-dev
mailing list