[PATCH] D60593: [GwpAsan] Introduce GWP-ASan.

Wed Apr 17 17:11:07 PDT 2019

hctim added a comment.

In order to get a good sense of the overall overhead of GWP-ASan, I created a synthetic test. Please note that this synthetic test is **not** representative of overall GwpAsan overhead, as it constant performs allocations and deallocations. I will try and get some real-world tests soon.

  int main() {
    for (long unsigned i = 0; i < 50000000; ++i) {
      void *p = malloc(50 + i % 2);
      memset(p, 0x0, 52);

      p = realloc(p, 52 + i % 2);
      memset(p, 0x0, 10);

      free(p);
    }
  }

I changed `shouldSample()` around to get some data points, as to whether we want to have a branch for initialisation, or have it be an logical AND in the return statement. Assumedly, the former would be better when GwpAsan is runtime-disabled, and the latter would be better when enabled. My results (and their relative performance to the compile-time disabled version of Scudo) are shown below. All examples are compiled with `-O2 -fsanitize=scudo`:

| Option                                  | Median Time (with 20 runs) | Runtime increase relative to CT-Disabled |
| Compile-time disabled                   | 3.5715s                    | 0.0000%                                  |
| Runtime disabled (init check in ret)    | 3.6490s                    | 2.1239%                                  |
| Runtime disabled (init check in branch) | 3.6350s                    | 1.7469%                                  |
| Runtime enabled (init check in ret)     | 3.6435s                    | 1.9761%                                  |
| Runtime enabled (init check in branch)  | 3.6500s                    | 2.1507%                                  |
|

Note that with runtime enabled, we are not testing the performance of sampled allocations. There was a `1 / 1Q` chance of a single allocation being sampled, meaning that each run (`1B` allocations per run) had a `1 / 1M` chance of having a single sampled allocation.

I've made a quick optimisation to set the sample rate to be 2**64 - 1 when GwpAsan is disabled, so that we regenerate the PRNG as infrequently as possible, without changing the fast path. Without this optimisation, runtime disabled w/ init check in ret was 13.0360% slower.

I think our primary use case here should be always-on GwpAsan, and as such we should do the init check in the return. I've updated the patch to reflect this with the optimisation.

================
Comment at: compiler-rt/lib/gwp_asan/guarded_pool_allocator.cpp:84
+  // details.
+  SampleRate = Opts.SampleRate - 1;
+  IsInitialised = true;
----------------
hctim wrote:
> vlad.tsyrklevich wrote:
> > Wait, why do we subtract 1 here and then always add 1 later?
> `NextSampleCounter` in `shouldSample()` calculates `PRNG % (SampleRate + 1)` as `SampleRate == 0` on thread-init, and would cause an arithmetic error if `malloc()` is called befire `init()` completes. If we can enfore that the allocator doesn't call `malloc()` before `init()` finishes, then we can remove this.
Removed as `init()` no longer calls `malloc()`.

================
Comment at: compiler-rt/lib/gwp_asan/guarded_pool_allocator.cpp:296
+    // allocated. If this is the case, there is no valid metadata.
+    if (Meta == nullptr)
+      exit(EXIT_FAILURE);
----------------
vlad.tsyrklevich wrote:
> Meta won't be nullptr in this case though right? You need to look into the metadata to check that it's never been allocated.
Done some refactoring and resolved this.

================
Comment at: compiler-rt/lib/gwp_asan/options.inc:21
+GWP_ASAN_OPTION(
+    bool, PerfectlyRightAlign, false,
+    "When allocations are right-aligned, should we perfectly align them up to "
----------------
vlad.tsyrklevich wrote:
> morehouse wrote:
> > malloc is required to return memory that is "suitably aligned".  Won't perfect right alignment violate this?
> > 
> > E.g.,
> > ```
> > uint8_t *buf = malloc(5);
> > // ... Copy some data into buf ...
> > uint32_t *word = (uint32_t *) buf;
> > *word += buf[4];
> > ```
> > 
> > I think if malloc right-aligns, some processors would fail to load the 32-bit word on the last line.
> I think it's a useful option on platforms that do support unaligned access (Intel, ARM >= v7). We've heard that this was useful with PageHeap for finding small OOB accesses, though obviously perf characteristics from using this option will vary significantly depending on the application.
Yep, for safety reasons this is why it's disabled-by-default. I would personally prefer that users who understand the architectural configuration can enable it for themselves at the risk of performance degradation.

================
Comment at: compiler-rt/lib/scudo/scudo_allocator.cpp:243
+    : Quarantine(LINKER_INITIALIZED) {
+      gwp_asan::options::initOptions();
+      GuardedAlloc.init(*gwp_asan::options::getOptions());
----------------
eugenis wrote:
> hctim wrote:
> > eugenis wrote:
> > > hctim wrote:
> > > > cryptoad wrote:
> > > > > We are avoiding any non-trivial initialization on static objects. Ideally those 2 would go in `init`.
> > > > So this kind of conflicts with my previous statement. If we move this initialisation into `init()`, we will recurse into `init() -> Allocator::allocate() -> initThreadMaybe() -> init()` and we deadlock on `pthread_once`.
> > > > 
> > > > We have two options here:
> > > > 1. Leave the GwpAsan initialisation in the c'tor, which forces the initialisation of Scudo at startup, rather than lazily initialising.
> > > > 2. We avoid malloc() and instead do our own memory management in GwpAsan's `init()`. 
> > > > 
> > > > @eugenis - What do you think is the right long-term support option for GwpAsan here?
> > > how about
> > > 3. Do "fast pass" initialization in init(), then do full initialization on first sampled malloc
> > > 
> > > "fast pass" initialization is just enough to do sampling, i.e. the inlined part of shouldSample().
> > > 
> > > Since init() can run in a secondary thread, the access to SampleRate can race and needs to be changed to an atomic.
> > So this would be a problem with lazy-init of GWP-ASan. For lazy-initialisation, we would require  `NextSampleCounter`, `SampleRate`, `GuardedPagePool`, and `GuardedPagePoolEnd` to all be atomic (and at least one with . At the moment, `init()` is run before any allocations take place, so it shouldn't race. Changing these to atomics would incur a significant performace overhead to the allocator's fast path.
> > 
> > I'm leaning towards #2 here, and requiring the c'tor to initialise GwpAsan before any allocations take place.
> Relaxed atomic. We don't need any fancy memory ordering here, only atomicity of individual stores and loads. They will compile to the same code as non-atomic, or very similar.
> 
> #2 is also fine, but it limits us a little. Not a big deal, I guess, if we already are not allowed to use libc in common gwp-asan code because of fuchsia - is that correct?
> 
I think we've decided to do memory-mapped pages instead.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60593/new/

https://reviews.llvm.org/D60593