[PATCH] D77283: scudo: Add support for diagnosing memory errors when memory tagging is enabled.

Fri Apr 3 14:39:10 PDT 2020

eugenis added inline comments.

================
Comment at: compiler-rt/lib/scudo/standalone/combined.h:723

+  void setTrackAllocationStacks(bool Track) {
+    Options.TrackAllocationStacks = Track;
----------------
There is no synchronization. Is this meant to be used in single thread or stop-the-world context only?
If we turn Options into a relaxed atomic, this could be made thread-safe, with the understanding that the other threads will change their behavior "eventually".

Either way this should be mentioned in the declaration comment.

================
Comment at: compiler-rt/lib/scudo/standalone/combined.h:755
+      *Data = &Memory[Addr - MemoryAddr];
+      *Tag = uint8_t(
+          MemoryTags[(Addr - MemoryAddr) / archMemoryTagGranuleSize()]);
----------------
static_cast<uint8_t>
or is it reinterpret_cast?

================
Comment at: compiler-rt/lib/scudo/standalone/combined.h:785
+        return;
+      for (unsigned I = 0; I != Size && I != 64; ++I)
+        Trace[I] = (*Depot)[RingPos + I];
----------------
give a name to this constant somewhere

================
Comment at: compiler-rt/lib/scudo/standalone/combined.h:799
+          Header.State != Chunk::State::Allocated &&
+          Data[MemTagPrevTagIndex] == PtrTag) {
+        auto *R = &ErrorInfo->reports[NextErrorReport++];
----------------
TBH, I'm not sure about relying on delayed block reuse for use-after-free reports. Not aggressively reusing the most recently freed block will hurt both RAM and CPU (because of worse heap locality). It feels that for any given target probability (however defined) of deallocation info being available in the event of a use-after-free, persisting smaller records (like in hwasan history buffer) would be a better trade-off.

Of course, the exception from this would be use-immediately-after-free.

================
Comment at: compiler-rt/lib/scudo/standalone/combined.h:1031
+    auto *Ptr32 = reinterpret_cast<u32 *>(Ptr);
+    Ptr32[MemTagDeallocationTraceIndex] = collectStackTrace();
+    Ptr32[MemTagDeallocationTidIndex] = getThreadID();
----------------
Why not store the entire deallocation stack trace here if it fits, and reduce the stack depot load by 2x?

================
Comment at: compiler-rt/lib/scudo/standalone/stack_depot.h:44
+
+  static const uptr kTabBits = 16;
+  static const uptr kTabSize = 1 << kTabBits;
----------------
hctim wrote:
> I worry that these values aren't configurable. We might want to have a larger stack depot on some devices, if:
> 
>   # Scudo ends up using a stack depot for a MTE-similar feature on another architecture.
>   # Someone ends up using stack trace collection for non-MTE scenarios (I mean, it's a cheap way to do heap profiling).
>   # We have different devices that have more/less sensitivity to memory overhead.
> 
> Should we make this templated?
> 
> 
Not before there are actual users of this.

================
Comment at: compiler-rt/lib/scudo/standalone/stack_depot.h:58
+    for (uptr *i = begin; i != end; ++i)
+      b.add(u32(*i));
+    u32 hash = b.get();
----------------
could shift by >>2 to get more meaningful bits into the hash

================
Comment at: compiler-rt/lib/scudo/standalone/stack_depot.h:93
+      ring_pos = (ring_pos + 1) & kRingMask;
+      b.add(u32(ring[ring_pos]));
+    }
----------------
All accesses to ring and tab are, technically, data races, and must be done with atomic calls.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77283/new/

https://reviews.llvm.org/D77283