[PATCH] D20084: [sanitizer] Initial implementation of a Hardened Allocator

Thu May 12 10:25:20 PDT 2016

dvyukov added inline comments.

================
Comment at: projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc:64
@@ +63,3 @@
+// as of now), which fits nicely with the alignment requirements. It's storing:
+// - a 16-bit checksum
+// - the user requested size for that chunk (needed for reallocation purposes)
----------------
It's better to comment right on the fields rather than duplicate them here. The comment has good chances of getting outdated. It's also harder to find the relevant part of the comment for a particular field.
E.g.:

  u8  state          : 2;  // available, allocated, or quarantined

comments like 'salt' on 'salt' field are excessive, drop them.

================
Comment at: projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc:87
@@ +86,3 @@
+  // 2nd 8 bytes
+  u64 offset         : 24;
+  u64 unused_1_      : 24;
----------------
Don't we need only 20 bits here?

================
Comment at: projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc:104
@@ +103,3 @@
+  // prevent this, we work with a stack based copy of the header, hence the
+  // following static function.
+  static void *AllocBeg(ScudoChunk *chunk, UnpackedHeader *header) {
----------------
I am missing the relation between the requirement to not load header second time and making the function static.
Why not:

    void *AllocBeg(UnpackedHeader *header)

?

================
Comment at: projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc:126
@@ +125,3 @@
+    // would require holding an additional copy of it.
+    crc = _mm_crc32_u64(crc, header_holder[0] & 0xffffffffffff0000ULL);
+    crc = _mm_crc32_u64(crc, header_holder[1]);
----------------
Add a bold comment to checksum filed that Checksum expects it to be low 16 bits. And maybe add some debug check here.

================
Comment at: projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc:128
@@ +127,3 @@
+    crc = _mm_crc32_u64(crc, header_holder[1]);
+    return static_cast<u16>(crc ^ cookie);
+  }
----------------
Won't it do to initialize crc to cookie as:

    u64 crc = _mm_crc32_u64(cookie, reinterpret_cast<uptr>(this));

?

================
Comment at: projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc:135
@@ +134,3 @@
+    __atomic_load(reinterpret_cast<const PackedHeader *>(this), &packed_header,
+                  __ATOMIC_ACQUIRE);
+    *unpacked_header = bit_cast<UnpackedHeader>(packed_header);
----------------
Why does this need to be acquire? Please comment.

================
Comment at: projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc:153
@@ +152,3 @@
+      __atomic_store(reinterpret_cast<PackedHeader *>(this), &new_packed_header,
+                     __ATOMIC_RELEASE);
+    } else {
----------------
Why release?

================
Comment at: projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc:161
@@ +160,3 @@
+                                     false,
+                                     __ATOMIC_ACQUIRE,
+                                     __ATOMIC_ACQUIRE)) {
----------------
glider wrote:
> Shouldn't the success memory order be __ATOMIC_RELEASE?
Why acquire?

================
Comment at: projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc:162
@@ +161,3 @@
+                                     __ATOMIC_ACQUIRE,
+                                     __ATOMIC_ACQUIRE)) {
+        Printf("ERROR: race on chunk header at address %p\n", this);
----------------
Looks pointless.

================
Comment at: projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc:187
@@ +186,3 @@
+  DrainQuarantine();
+  get_allocator().DestroyCache(&cache);
+}
----------------
This will leak memory.
Destructors run FIFO order, so later-created user dtors can run after you. Plus pthread frees thread stack and pthread_specific regions after running pthread_specific dtors.

================
Comment at: projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc:195
@@ +194,3 @@
+static void NOINLINE initThread() {
+  pthread_once(&GlobalInited, initGlobal);
+  pthread_setspecific(pkey, reinterpret_cast<void *>(1));
----------------
Why initGlobal is not called from ScudoInitInternal?
If you expect that malloc can come before ScudoInitInternal, then you also need to call ScudoInitInternal from initThread. Otherwise it won't work anyway.

================
Comment at: projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc:241
@@ +240,3 @@
+typedef ScudoQuarantine::Cache QuarantineCache;
+// static thread_local QuarantineCache quarantine_cache;
+static THREADLOCAL uptr quarantine_cache[4] = {};
----------------
Why is this commented out? Looks cleaner.

================
Comment at: projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc:283
@@ +282,3 @@
+    allocator.Init(options.MayReturnNull);
+    quarantine.Init(static_cast<uptr>(options.QuarantineSizeMb) << 20,
+                    MaxThreadLocalQuarantine);
----------------
Do we want to sanity check options.QuarantineSizeMb) << 20? What if it overflows?

================
Comment at: projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc:284
@@ +283,3 @@
+    quarantine.Init(static_cast<uptr>(options.QuarantineSizeMb) << 20,
+                    MaxThreadLocalQuarantine);
+    cookie = Prng.Next();
----------------
Make this tunable as well. If my program has 10000 threads, 1MB per thread is a lot.

================
Comment at: projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc:293
@@ +292,3 @@
+    if (!IsPowerOfTwo(alignment)) {
+      Printf("ERROR: alignment is not a power of 2\n");
+      Die();
----------------
s/alignment/malloc alignment/
So that user can get at least some glue when she sees this on console.

================
Comment at: projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc:325
@@ +324,3 @@
+    if (chunk_beg != alloc_beg + ChunkHeaderSize) {
+      header.with_offset = 1;
+      header.offset = (chunk_beg - alloc_beg) >> MinAlignmentLog;
----------------
It seems to me that we don't actually need with_offset and all the associated if's.
You can just always store (chunk_beg - alloc_beg) >> MinAlignmentLog into header.offset and always subtract it from user_beg.

================
Comment at: projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc:326
@@ +325,3 @@
+      header.with_offset = 1;
+      header.offset = (chunk_beg - alloc_beg) >> MinAlignmentLog;
+    }
----------------
There is a very tricky, implicit relation between MinAlignment, MaxAlignment and number of bits in offset. If there of these change in future we can get a nice attack vector due to offset overflow.

Check that MaxAlignment/MinAlignment fits into offset during init.

================
Comment at: projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc:381
@@ +380,3 @@
+    if (DeleteSizeMismatch) {
+      if (delete_size && delete_size != size) {
+        Printf("ERROR: invalid sized delete on chunk at address %p\n", chunk);
----------------
I wonder if delete_size can be 0 and it does not mean that delete_size is not passed, it is just legally zero. What is passed in for delete of an array with 0 elements?

================
Comment at: projects/compiler-rt/lib/hardened_allocator/scudo_flags.cc:56
@@ +55,3 @@
+  if (f->QuarantineSizeMb < 0) {
+    const int kDefaultQuarantineSizeMb = 1UL << 6; // 64 MB
+    f->QuarantineSizeMb = kDefaultQuarantineSizeMb;
----------------
You use convoluted way to express 64 that requires remembering powers of two, and then spell 64 in comments. Why not just say "64"?

================
Comment at: projects/compiler-rt/lib/hardened_allocator/scudo_flags.inc:18
@@ +17,3 @@
+
+SCUDO_FLAG(int, QuarantineSizeMb, -1,
+           "Size (in Mb) of quarantine used to delay the actual deallocation "
----------------
s/-1/64/ then the default will be visible in help as well.
User can't tune this value if she does not have a single reference point. If I know that the default is 64, then I can set it to 32 of 128. If I don't know the default, what am I supposed to do?

================
Comment at: projects/compiler-rt/lib/hardened_allocator/scudo_utils.cc:109
@@ +108,3 @@
+    std::hash<std::thread::id> hasher;
+    return RdTSC() ^ hasher(std::this_thread::get_id()) ^
+        std::chrono::high_resolution_clock::now().time_since_epoch().count();
----------------
glider wrote:
> glider wrote:
> > My gut feeling is that XORing rdtsc and the time since epoch is actually reducing the entropy, not increasing it. Any idea if that's true?
> > Also, do we really need a dependency on std::chrono?
> As a data point, I've ran RdTSC() and std::chrono::high_resolution_clock::now().time_since_epoch().count() 200278017 times.
> The number of unique values of both variables was exactly 200278017, while the number of unique XOR values was only 200205416, i.e. there were 0.036% collisions.
This is used to initialize global cookie. I would use /dev/random. Or there must be 16 bytes of good randomness in auxv.

================
Comment at: projects/compiler-rt/lib/hardened_allocator/scudo_utils.h:44
@@ +43,3 @@
+  Xorshift128Plus();
+  Xorshift128Plus(u64 state_0, u64 state_1)
+    : state_0_(state_0), state_1_(state_1) {}
----------------
This is not used. Remove.

================
Comment at: projects/compiler-rt/lib/hardened_allocator/scudo_utils.h:46
@@ +45,3 @@
+    : state_0_(state_0), state_1_(state_1) {}
+  void SetSeed(u64 state_0, u64 state_1) {
+    state_0_ = state_0;
----------------
This is not used. Remove.

================
Comment at: projects/compiler-rt/lib/hardened_allocator/scudo_utils.h:50
@@ +49,3 @@
+  }
+  void GetSeed(u64 *state_0, u64 *state_1) {
+    *state_0 = state_0_;
----------------
glider wrote:
> GetSeed is unused.
This is not used. Remove.

http://reviews.llvm.org/D20084