[PATCH] D87120: [MemProf] Memory profiling runtime support

Mon Oct 12 00:04:56 PDT 2020

vitalybuka added a comment.

LGTM
I'd like to accept next revision (if any).
Other reviewers, please comment if you have something to add.

================
Comment at: compiler-rt/lib/memprof/memprof_allocator.cpp:701
+      return nullptr;
+    MemprofChunk *p = reinterpret_cast<LargeChunkHeader *>(alloc_beg)->Get();
+    if (p)
----------------
if p == nullptr then 
p is either:
1. tiny chunk from primary allocator 
2. chunk under construction from the secondary allocator
for the [1] "return reinterpret_cast<MemprofChunk *>(alloc_beg);" makes sense, but it sill possible to have uninitialized check
for the [2] it's better to return null

asan has the following here:

```
AsanChunk *p = reinterpret_cast<LargeChunkHeader *>(alloc_beg)->Get();
    if (!p) {
      if (!allocator.FromPrimary(alloc_beg))
        return nullptr;
      p = reinterpret_cast<AsanChunk *>(alloc_beg);
    }
```

then asan does the following which is relevant for [1] ([2] must always has correct state here):

```
    u8 state = atomic_load(&p->chunk_state, memory_order_relaxed);
    // It does not guaranty that Chunk is initialized, but it's
    // definitely not for any other value.
    if (state == CHUNK_ALLOCATED || state == CHUNK_QUARANTINE)
      return p;
    return nullptr;
```

memprof has no state, so maybe allocation size == 0 can be used as such sentinel, but it should be atomic.

================
Comment at: compiler-rt/lib/memprof/memprof_allocator.cpp:554-555
+      CHECK_LE(alloc_beg + 2 * sizeof(uptr), chunk_beg);
+      reinterpret_cast<uptr *>(alloc_beg)[0] = kAllocBegMagic;
+      reinterpret_cast<uptr *>(alloc_beg)[1] = chunk_beg;
+    }
----------------
tejohnson wrote:
> vitalybuka wrote:
> > magic should better be atomic, set when entire Chunk is initialized and after chunk_beg is set
> > as FinishAndPrint may run in another thread.
> > I updated asan code, see LargeChunkHeader.
> > 
> > Also ForEachChunk appyes to all, including non allocated and deallocated chunks.
> > So Deallocate needs to reset magic as well as we don't have guaranty that FinishAndPrint can see chunk_beg for deallocated block.
> > Probably user_requested_size needs to be atomic as well?
> Great, I stole your LargeChunkHeader code from asan and used that approach here. =)
> I added the appropriate check to FinishAndPrint that we're getting back a non-null MemprofChunk from getMemprofChunk. With that change I'm not sure that we need to make user_requested_size atomic?
Please check GetMemprofChunk comment below.

================
Comment at: compiler-rt/lib/memprof/memprof_allocator.cpp:679
+
+  uptr AllocationSize(uptr p) {
+    MemprofChunk *m = GetMemprofChunkByAddr(p);
----------------
tejohnson wrote:
> vitalybuka wrote:
> > can you use "uptr chunk_beg = p - kChunkHeaderSize;" as in deallocate ?
> > to remove one kAllocBegMagic user
> This is only used by memprof_malloc_usable_size. Looks like it handles being invoked on a pointer that may not be the start of the block. I couldn't find any documentation about how malloc_usable_size should behave if called on a pointer in the middle of an allocation. I tried using the system version and it doesn't error (like a deallocation does), but returns 0 in that case. My guess is behavior is undefined? In any case, not sure if it is worth optimizing for this case. WDYT?
When I commented I assumed that kAllocBegMagic can be removed, but it's still used by FinishAndPrint.
So either way is fine.

================
Comment at: compiler-rt/lib/memprof/memprof_mapping.h:64
+static inline uptr MemToShadowSize(uptr size) { return size >> SHADOW_SCALE; }
+static inline bool AddrIsInLowMem(uptr a) {
+  return a <= kLowMemEnd;
----------------
clang format warning?

================
Comment at: compiler-rt/lib/memprof/memprof_mapping.h:93
+
+static inline uptr MemToShadow(uptr p) {
+  CHECK(AddrIsInMem(p));
----------------
Probably "static inline -> inline" should suppress "unused" warnings

================
Comment at: compiler-rt/lib/memprof/memprof_rtl.cpp:183-184
+
+  MemprofCheckIncompatibleRT();
+  MemprofCheckDynamicRTPrereqs();
+  AvoidCVE_2016_2143();
----------------
tejohnson wrote:
> vitalybuka wrote:
> > Probably not needed, I don't see similar code in other sanitizers
> Since these seem to be checking for valid inconsistencies, I'd prefer to leave them, unless there's a reason they are not needed or problematic.
I suspect it was needed when asan was developed before into llvm repository. Now compiler and runtime should be always on the same version.

================
Comment at: compiler-rt/test/memprof/TestCases/dump_process_map.cpp:4
+// RUN: %clangxx_memprof -O0 %s -o %t
+// RUN: %env_memprof_opts=dump_process_map=1 %run %t 2>&1 | FileCheck %s
+// RUN: %env_memprof_opts=dump_process_map=0 %run %t 2>&1 | FileCheck %s --check-prefix=NOMAP
----------------
tejohnson wrote:
> vitalybuka wrote:
> > maybe better to add print_module_map=3 and document something
> > 3 - print module map for debugging
> > and assume sanitizer may use it when ever it wants.
> I'm a little unsure of what you are suggesting here. There is an existing DumpProcessMap(), which is only implemented by sanitizer_posix. This is different than the PrintModuleMap() controlled by print_module_map that only has a non-empty implementation for sanitizer_mac. Are you suggesting that instead of adding dump_process_map, I add print_module_map=3 and have it invoke DumpProcessMap? What is the intended distinction between PrintModuleMap and DumpProcessMap?
Looks like they are different.
However HWAsan uses 
```
if (common_flags()->print_module_map)
    DumpProcessMap();
```
So I recommend to avoid new flags and use this one for now. Maybe we some update to the doc string.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87120/new/

https://reviews.llvm.org/D87120