[libcxx-commits] [PATCH] D132769: [2b/3][ASan][libcxx] std::basic_string annotations

Wed Mar 22 04:09:51 PDT 2023

AdvenamTacet added a comment.

@ldionne When a program is compiled with ASan, these annotations allow it to detect all accesses to memory that has been allocated but is not currently in use by an object of std::basic_string type. The only exception are functions where instrumentation has been explicitly disabled.

String annotations already exist in //Microsoft C++ Standard Library//.
Link: https://devblogs.microsoft.com/cppblog/stdstring-now-supports-address-sanitizer/

It’s not the same as libc++ hardening. While all accesses with `operator[]` or iterators are also detected, particularly important are cases with raw pointers, such as when passing data to C functions (`strcpy()` etc.).

The simplest example may be:

  std::string s = “abcdefg”;
  char *ptr = &s[4];
  s = “”;
  char c = *ptr; // ASan error

A write to `*ptr` would be detected as well.

Examples of what is detected are:

- incorrect use of C functions,
- random access to that memory (`*(char *)randint() = 'x';` will be detected, if by chance unused strings memory is accessed),
- use after resize (as in the example above),
- other BO etc.

A real world bug, which motivated that research was as follow (simplified):

  bool custom_cmp(const char l, const char r) {
          std::cout << "DEBUG print: " << l << " " << r << std::endl;

          return l < 'a' || l == r;
  }

  void foo() {
    std::string str1 = "A VERY LONG STRING IS HERE";
    std::string str2 = "short string";
    const char* iter1_begin = str1.c_str();
    const char* iter1_end   = str1.c_str() + str1.size();
    const char* iter2_begin = str2.c_str();

    std::cout << (std::equal(iter1_begin, iter1_end, buf, custom_cmp)) << std::endl;
  }

Out-of-bounds read could happen as two strings were compared via a std::equals function that took `iter1_begin`, `iter1_end`, `iter2_begin` iterators (with a custom comparison function). When object `str1` was longer than `str2`, read out-of-bounds on `iter2` could happen. Container sanitization would detect it.
I hope it shows when those annotations are applicable. Let me know if you want me to add a bunch of samples to the patch. I can add a bunch of failing unit tests with examples. Is it a good idea?

This implementation uses the same API as `std::vector` (function: `__sanitizer_annotate_contiguous_container`) and from high level is the same here and there.
Long `std::basic_string` is almost the same as `std::vecotr`, so logic is absolutely the same. The difference is support for short strings, but that logic is very similar as well.	
Long string (with standard allocators) annotations work with old ASan API (does not require rG1c5ad6d2c01294a0decde43a88e9c27d7437d157 <https://reviews.llvm.org/rG1c5ad6d2c01294a0decde43a88e9c27d7437d157>, so long-only patch wouldn't create a problem for ChromeOS). Only short string annotations require new API rG1c5ad6d2c01294a0decde43a88e9c27d7437d157 <https://reviews.llvm.org/rG1c5ad6d2c01294a0decde43a88e9c27d7437d157>.
If you think it's a good idea, I can split this patch into two. One for long only, second enabling SSO annotations.

It is worth mentioning that the change, proposed by me here, has negligible impact on performance, because when binary is already compiled with ASan, every access to strings memory is instrumented either way. Code from that change is executed only when strings size is modified and it's proportional to the change. All functions are no-ops when compiled without ASan.

-------------------

Examples are above, here I let myself to digress a little. One may argue that some out of bounds errors may be detected by ASan without those changes, and that's true. If out of bound reaches memory outside of allocated area, ASan detects the bug without those changes. In situations like that, those changes increase speed of detecting errors. Potentially significantly, but I will come back to it. It's also worth to remember that there are no poisoned areas between elements in arrays.

  std::string s[1024];
  // ^ There are NO redzones between elements.

And memory is not poisoned inside classes between variables (and at the beginning and the end):

  class X {
  std::string x;
  // NO redzone
  int a[3];
  };

Therefore, not every access outside of objects memory is detected. (Eg. above, `s[0]` is short, without detecting an error, BO can write over rest of 1023 strings memory. Error will be detected when memory after s[1023] is reached.) With those changes OOB will be detected, unless `size() == capacity()` before OOB happenes (case when there is no allocated and not-in-use memory).

-------------------

Because short strings are on stack, bugs affect not only heap. Bellow I show modified example from above, with standard comparator, which leaks stack memory (including canary). Assumptions:

- user fully controls `s[0]`,
- user has access to result of `std::equals`,
- it's super simplified.

  bool result = false;
  char x = 0;
  size_t n = 0;

  void user_input(std::string &s) {
    // Simulation of user providing input
    n += 1;

    if(result) {
      x = 0;
      std::cout << "Leaked " << s.size() << " bytes, with " << n << " requests." << std::endl;
      std::cout << "Leak: " << s << std::endl << std::endl;
      s.push_back('\0');
    } else {
      if(!s.empty()) s.pop_back();
      x += 1;
      s.push_back(x);
    }
  }

  void leak_cannary() {
    std::string s[3];
    s[2] = "secret data";
    for(int i = 0; i < 'z' * 100; ++i) {
      result = std::equal(s[0].begin(), s[0].end(), s[1].data());
      // User has access to the result.
      user_input(s[0]);
    }
  }

Example result is: `Leaked 169 bytes, with 12062 requests.`

Chance of finding that bug with simple (random) harness and //today ASan//  is close to zero, but it can be easily exploited.

Here I won't fully offtop how off-by-one on a buffer just before string may leak stack as well (same off-by-two on string with Alternate String Layout). If someone is interested, I'm happy to talk about ways to exploit SSO (connected with some bugs) outside of that revision, I was looking at it to create a CTF challenge.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132769/new/

https://reviews.llvm.org/D132769