[PATCH] D77283: scudo: Add support for diagnosing memory errors when memory tagging is enabled.

Thu Apr 2 14:06:40 PDT 2020

hctim added inline comments.

================
Comment at: compiler-rt/lib/scudo/standalone/include/scudo/interface.h:30
+// the error_info data structure. Up to three possible causes are returned in
+// the reports array, in decreasing order of probability. The remaining elements
+// of reports are zero-initialized.
----------------
Sorry, 'remaining elements' here meaning any `scudo_error_report` struct that's unused?

================
Comment at: compiler-rt/lib/scudo/standalone/include/scudo/interface.h:37
+//
+// ptr is the fault address. On aarch64 this is available in the system register
+// FAR_ELx, or far_context.far in an upcoming release of the Linux kernel. This
----------------
Why `ptr` then? Why not `fault_address`?

================
Comment at: compiler-rt/lib/scudo/standalone/include/scudo/interface.h:45
+// obtained by calling the function __scudo_get_stack_depot_addr() in the
+// crashing process. The size of the stack depot is available by calling the
+// function __scudo_get_stack_depot_size().
----------------
"in the crashing process, generally ahead-of-time (at libc init, for example)."

================
Comment at: compiler-rt/lib/scudo/standalone/include/scudo/interface.h:55
+// The more memory available via this pointer, the more likely it is that the
+// function will be able to analyze a crash correctly.
+//
----------------
Can you make a recommendation on how much memory we need? Should that be defined by the platform, or by Scudo (via. a callback)? I assume we need at least 3 allocations (one to the left and one to the right) to make a reasonable guess, but how many do you see us having? A constant 16? Is it some function on the sizeof the allocation?

Our top primary size class on Android is 64KiB + 16B. Assuming a static 16 allocations (8 on either side) for the crash handler to do its thing, that's a megabyte per crash. Not too bad.

Once we support secondary tagged allocations though (assuming we intend on doing this), we'd have to rethink this, no? Would it be worth having a callback of `__scudo_get_surrounding_metadata()` that, for some fault address, returns the metadata (header + any deallocation data left over) for some number of surrounding allocations alone?

================
Comment at: compiler-rt/lib/scudo/standalone/include/scudo/interface.h:59
+// via memory. Each byte of this array corresponds to a region of memory of size
+// equal to the architecturally defined memory tag granule size (16 on aarch64).
+//
----------------
Maybe add a comment where this is normally obtained from?

================
Comment at: compiler-rt/lib/scudo/standalone/include/scudo/interface.h:61
+//
+// memory_addr is the address of memory in the crashing process's address space.
+//
----------------
"the starting address"

================
Comment at: compiler-rt/lib/scudo/standalone/include/scudo/interface.h:65
+// pointer.
+void __scudo_get_error_info(struct scudo_error_info *error_info, uintptr_t ptr,
+                            const char *stack_depot, const char *region_info,
----------------
Meta note - I'd also add an explicit disclaimer here that this is not intended to work across API level boundaries (version_of_scudo_allocator === version_of_scudo_crash_handler).

================
Comment at: compiler-rt/lib/scudo/standalone/include/scudo/interface.h:70
+
+enum scudo_error_type {
+  UNKNOWN,
----------------
We now have two ways that scudo reports errors. For UaF/overflow, as per this, you get the report from `__scudo_get_error_info`. For internally detected errors (invalid free, double free), you still get an abort message. We should probably unify the behaviour here, because:

 1. The metadata we collect about tags/threads/traces is useful in debugging double frees, etc.
 2. Having different exit conditions is harder to keep track of. I can easily see a double-free that ends up doing some load/store with incorrect tags before it hits the internal trap, and subsequently we report it incorrectly (i.e. a double free now looks like an UNKNOWN or BUFFER_OVERFLOW, etc.).

================
Comment at: compiler-rt/lib/scudo/standalone/stack_depot.h:42
+  HybridMutex ring_end_mu;
+  u32 ring_end;
+
----------------
I know that this is zero-initialized by the linker in its current inclusion in the allocator class, but it's a bit roundabout to detemine this. Can you explicitly initialize these varaiables?

================
Comment at: compiler-rt/lib/scudo/standalone/stack_depot.h:44
+
+  static const uptr kTabBits = 16;
+  static const uptr kTabSize = 1 << kTabBits;
----------------
I worry that these values aren't configurable. We might want to have a larger stack depot on some devices, if:

  # Scudo ends up using a stack depot for a MTE-similar feature on another architecture.
  # Someone ends up using stack trace collection for non-MTE scenarios (I mean, it's a cheap way to do heap profiling).
  # We have different devices that have more/less sensitivity to memory overhead.

Should we make this templated?

================
Comment at: compiler-rt/lib/scudo/standalone/stack_depot.h:50
+  static const uptr kRingBits = 19;
+  static const uptr kRingSize = 1 << kRingBits;
+  static const uptr kRingMask = kRingSize - 1;
----------------
```
#ifdef __LP64__
  static const uptr kRingSize = kRingBits * 8;
#else
  static const uptr kRingSize = kRingBits * 4;
#endif
```

... as we can ask for trace collection on 32-bit?

================
Comment at: compiler-rt/lib/scudo/standalone/stack_depot.h:64
+    u64 entry = ring[ring_pos];
+    u64 id = (u64(end - begin) << 33) | (u64(hash) << 1) | 1;
+    if (entry == id)
----------------
Can we please add some more documentation to this class? I get that this is saving some unique-ID to prevent us from saving the same allocation trace twice, but it would be nice if that was immediately obvious.

================
Comment at: compiler-rt/lib/scudo/standalone/stack_depot.h:85
+    u64 hash_with_tag_bit = (u64(hash) << 1) | 1;
+    if ((entry & 0x1ffffffff) != hash_with_tag_bit)
+      return false;
----------------
Same thing here. What makes up this cookie should be more obvious IMHO. I get the top 31 bits are roughly a UID for the stack trace, the middle 32 bits are hash (and thus these two combined provide some hash-collision resistance), and the bottom bit is to uniquely identifiy cookies as the FP-address shouldn't have the LSB set (except for in thumb...), but putting this somewhere and typedefing this u64 seems like a good idea.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77283/new/

https://reviews.llvm.org/D77283