[llvm-dev] "compiler-rt" - DataFlowSanitizer

Sam Kerner via llvm-dev llvm-dev at lists.llvm.org
Thu Apr 18 15:32:24 PDT 2019


On Thu, Apr 18, 2019 at 10:01 AM dareen khalid <dareen.k.f at hotmail.com> wrote:
>
> I'm new to llvm passes. I wonder if I can use the pass to dynamically analyze a program.

As Bekket points out, the way to use DataFlowSanitizer is to modify
your program to call functions in sanitizer/dfsan_interface.h .

> Let me explain what I want to do in the following example .
>
>
>
> //"Result" is an object that doesn't have a fixed length like int and float.

DataflowSanitizer tracks the bytes of memory that store a variable.
"Result" may not have a fixed length at compile time, but at runtime
it presumably has a fixed set of bytes that hold its contents.  If you
can find those bytes, you can read the label.

>
> //The program is as follows.
> Result retrieve_data_fun (){
>  //the function retrieves some sensitive data from  files/DB
>
>  return result_info;
> }
>
> void main(){
> ...
> Result var1, var2, var3l
> var1 = retrieve_data_fun();
> ...
> Result var2 = retrieve_data_fun();
>
> printf (var1);
> var3 = var2+'xxx';
> print (var3);
>
> }
>
> My goal is to track all the returned data from "retrieve_data_fun" and monitor actions on them.

You would need to test for the label on each action.

> So, whenever the data is used (e.g., printed) , I want to detect that; maybe by printing a statement or anything.
> Could you please help me to do that ?

Here is a sketch of how this might be done:

  #include <sanitizer/dfsan_interface.h>

  static dfsan_label sensitive_data_label = dfsan_create_label("Data
returned by retrieve_data_fun()", NULL);

  // The function retrieves some sensitive data from files or a DB.
  // Users should call retrieve_data_fun() instead of this
  // implementation, to ensure results are labeled.
  Result retrieve_data_fun_impl () {
     ...
  }

  Result retrieve_data_fun() {
    // All the real work is done in retrieve_data_fun_impl().  This
    // function calls it, labels the result, and returns it.
    // This style ensures that all return paths in
    // retrieve_data_fun_impl() have the label applied to the result.
    Result = retrieve_data_fun_impl();
    dfsan_set_label<Result>(sensitive_data_label, Result);
    return Result;
  }

  bool IsResultSensitive(const Result& result) {
    // There are types for which sizeof(result) does not give the
    // bytes needed.  For example, if Result is a vector,
    // you may need to iterate over the elements in the vector, and
    // read the labels of the elements.
    dfsan_label label = dfsan_read_label(&result, sizeof(result));
    return (dfsan_has_label(label, sensitive_data_label);
  }

  bool isIntSensitive(int i) {
    dfsan_label label = dfsan_read_label(&i, sizeof(i));
    return (dfsan_has_label(label, sensitive_data_label);
  }

  void main() {
    // 'sensitive_result' is sensitive.  It was labeled as sensitive
    // by 'retrieve_data_fun()'.
    Result sensitive_result = retrieve_data_fun();

    // 'not_a_sensitive_result' is built without reading any sensitive
    // data.  It is not labeled.
    Result not_a_sensitive_result = Result();

    // Should print "1".
    std::cout << "Is 'sensitive_result' sensitive?  "
                   << IsResultSensitive(sensitive_result) << std::endl;

    // Should print "0".
    std::cout << "Is 'not_a_sensitive_result' sensitive?  "
                   << IsResultSensitive(not_a_sensitive_result) << std::endl;

    int derived_from_result = sensitive_result.getIntFromResult();  //
Assume this reads bytes from the result.

    // Should print "1".
    std::cout << "Is 'derived_from_result' sensitive?  " <<
IsIntSensitive(derived_from_result) << std::endl;

    // This will fail.  See the definition of TakeSomeActionOnResult().
    TakeSomeActionOnResult(sensitive_result);

    // This will print "REDACTED" because of the operator<< method below.
    std::cout << "Sensitive result is "<< sensitive_result << std::endl;
  }

  bool TakeSomeActionOnResult(const Result& result) {
    // Guard against taking this action on sensitive data.
    if (IsResultSensitive(result)) {
      std::cerr << "WARNING: Tried to call TakeSomeActionOnResult() on "
                    << "a sensitive result.  This is not allowed!";
      return false;
    }
    ...
    return true;
  }

  // Override the << operator to not print sensitive data.
  ostream& operator<<(ostream& os, const Result& result)
  {
    if (IsResultSensitive(result)) {
      os << "REDACTED";
    } else {
      // This result is not sensitive, construct a string form.
      ...
    }
    return os;
  }


>
> Thanks,
> Daren
>
>
> ________________________________
> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Sam Kerner via llvm-dev <llvm-dev at lists.llvm.org>
> Sent: Wednesday, April 17, 2019 5:56 PM
> To: llvm-dev at lists.llvm.org
> Subject: Re: [llvm-dev] "compiler-rt" - DataFlowSanitizer
>
> On Tue, Apr 16, 2019 at 3:44 PM dareen khalid via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> >
> > Hi all,
> >
> > I have some questions about "DataFlowSanitizer" from "compiler-rt".
> > I want to know how I can test the "DataFlowSanitizer"?
>
> This document is a good reference for DataFlowSanitizer:
>   https://clang.llvm.org/docs/DataFlowSanitizer.html
>
> > Can I configure it to label only some values,
>
> The section named "Example" in the document above shows a simple
> program that sets and tests for labels.
>
> dfsan_create_label() creates a label.
>
> dfsan_set_label() applies a label to the memory holding a variable.
>
> > i.e, the return values from specific functions?
>
> To label the return value of a function, add a call to
> dfsan_set_label() on the return value of the function:
>
>   // Outside the function:
>   dfsan_label return_label = dfsan_create_label("return_label", 0);
>
>   // An example function:
>   int MyFunction(int a, int b) {
>     ...
>     int result = ...;
>
>     // Set a label on the returned value:
>     dfsan_set_label(return_label, &result, sizeof(result));
>
>     return result;
>   }
>
> > Also, how can I print these labels?
>
> To discover the label on a variable, you can test for it and print the result:
>
>   int var = ...;
>
>   // Does 'var' have label 'return_label'?
>   dfsan_label var_label = dfsan_get_label(var);
>   if (dfsan_has_label(var_label, return_label)) {
>     printf("'var' has the label ''return_label");
>   }
>
> To see the state of all labels at the time the program exits set, set
> the shell variable DFSAN_OPTIONS to "dump_labels_at_exit=<file path>".
> For example, suppose the example program in the document is in a file
> named "dfsan.c".  Here are the commands I ran to see the state of all
> labels when it exits:
>
>   # Compile dfsan.c into a binary named "dfsan":
>   $ clang -g -fsanitize=dataflow dfsan.c -o dfsan
>
>   # Run it.  There is no output because all assertions pass:
>   $ ./dfsan
>
>   # Run it again with shell variable DFSAN_OPTIONS set to export label
> state to standard out on exit:
>   $ env DFSAN_OPTIONS=dump_labels_at_exit=/dev/stdout ./dfsan
>   ==21994==INFO: DataFlowSanitizer: dumping labels to /dev/stdout
>   1 0 0 i
>   2 0 0 j
>   3 0 0 k
>   4 1 2
>   5 3 4
>
> If you tell us more about what you are trying to accomplish with
> DataFlowSanitizer, we may be able to give more specific advice.
>
> >
> > Thanks,
> > Dareen
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


More information about the llvm-dev mailing list