[PATCH] D87120: [MemProf] Memory profiling runtime support

Thu Oct 15 18:00:49 PDT 2020

tejohnson marked 7 inline comments as done.
tejohnson added a comment.

Thanks, I've addressed your comments. I also have implemented a couple additional fixes I found through internal testing. Added implementations of some __sanitizer_* functions that need to be implemented by sanitizer_common users (e.g. __sanitizer_get_heap_size) and added a test to ensure they are all defined. I fixed a bug in the timestamp computation code. I also added a flag to ensure that we don't try to access the dynamically allocated CacheSet on the MemInfoBlockCache before it is constructed (e.g. before the Allocator object is initialized).

================
Comment at: compiler-rt/lib/memprof/memprof_allocator.cpp:701
+      return nullptr;
+    MemprofChunk *p = reinterpret_cast<LargeChunkHeader *>(alloc_beg)->Get();
+    if (p)
----------------
vitalybuka wrote:
> if p == nullptr then 
> p is either:
> 1. tiny chunk from primary allocator 
> 2. chunk under construction from the secondary allocator
> for the [1] "return reinterpret_cast<MemprofChunk *>(alloc_beg);" makes sense, but it sill possible to have uninitialized check
> for the [2] it's better to return null
> 
> asan has the following here:
> 
> ```
> AsanChunk *p = reinterpret_cast<LargeChunkHeader *>(alloc_beg)->Get();
>     if (!p) {
>       if (!allocator.FromPrimary(alloc_beg))
>         return nullptr;
>       p = reinterpret_cast<AsanChunk *>(alloc_beg);
>     }
> ```
> 
> then asan does the following which is relevant for [1] ([2] must always has correct state here):
> 
> ```
>     u8 state = atomic_load(&p->chunk_state, memory_order_relaxed);
>     // It does not guaranty that Chunk is initialized, but it's
>     // definitely not for any other value.
>     if (state == CHUNK_ALLOCATED || state == CHUNK_QUARANTINE)
>       return p;
>     return nullptr;
> ```
> 
> 
> memprof has no state, so maybe allocation size == 0 can be used as such sentinel, but it should be atomic.
Changed user_requested_size to be atomic, and updated this code as suggested. Since we are now doing the check for user_requested_size != 0 here, I have removed that check from the callsite in FinishAndPrint. I also updated GetMemprofChunk to pass back the user_requested_size in a parameter, so we don't need to do another atomic load in the callsites.

================
Comment at: compiler-rt/lib/memprof/memprof_allocator.cpp:679
+
+  uptr AllocationSize(uptr p) {
+    MemprofChunk *m = GetMemprofChunkByAddr(p);
----------------
vitalybuka wrote:
> tejohnson wrote:
> > vitalybuka wrote:
> > > can you use "uptr chunk_beg = p - kChunkHeaderSize;" as in deallocate ?
> > > to remove one kAllocBegMagic user
> > This is only used by memprof_malloc_usable_size. Looks like it handles being invoked on a pointer that may not be the start of the block. I couldn't find any documentation about how malloc_usable_size should behave if called on a pointer in the middle of an allocation. I tried using the system version and it doesn't error (like a deallocation does), but returns 0 in that case. My guess is behavior is undefined? In any case, not sure if it is worth optimizing for this case. WDYT?
> When I commented I assumed that kAllocBegMagic can be removed, but it's still used by FinishAndPrint.
> So either way is fine.
Ok I'll leave it in for now

================
Comment at: compiler-rt/lib/memprof/memprof_mapping.h:93
+
+static inline uptr MemToShadow(uptr p) {
+  CHECK(AddrIsInMem(p));
----------------
vitalybuka wrote:
> Probably "static inline -> inline" should suppress "unused" warnings
Yep, fixed all of the static inline in headers to be just inline.

================
Comment at: compiler-rt/lib/memprof/memprof_rtl.cpp:183-184
+
+  MemprofCheckIncompatibleRT();
+  MemprofCheckDynamicRTPrereqs();
+  AvoidCVE_2016_2143();
----------------
vitalybuka wrote:
> tejohnson wrote:
> > vitalybuka wrote:
> > > Probably not needed, I don't see similar code in other sanitizers
> > Since these seem to be checking for valid inconsistencies, I'd prefer to leave them, unless there's a reason they are not needed or problematic.
> I suspect it was needed when asan was developed before into llvm repository. Now compiler and runtime should be always on the same version.
Ok removed these.

================
Comment at: compiler-rt/test/memprof/TestCases/dump_process_map.cpp:4
+// RUN: %clangxx_memprof -O0 %s -o %t
+// RUN: %env_memprof_opts=dump_process_map=1 %run %t 2>&1 | FileCheck %s
+// RUN: %env_memprof_opts=dump_process_map=0 %run %t 2>&1 | FileCheck %s --check-prefix=NOMAP
----------------
vitalybuka wrote:
> tejohnson wrote:
> > vitalybuka wrote:
> > > maybe better to add print_module_map=3 and document something
> > > 3 - print module map for debugging
> > > and assume sanitizer may use it when ever it wants.
> > I'm a little unsure of what you are suggesting here. There is an existing DumpProcessMap(), which is only implemented by sanitizer_posix. This is different than the PrintModuleMap() controlled by print_module_map that only has a non-empty implementation for sanitizer_mac. Are you suggesting that instead of adding dump_process_map, I add print_module_map=3 and have it invoke DumpProcessMap? What is the intended distinction between PrintModuleMap and DumpProcessMap?
> Looks like they are different.
> However HWAsan uses 
> ```
> if (common_flags()->print_module_map)
>     DumpProcessMap();
> ```
> So I recommend to avoid new flags and use this one for now. Maybe we some update to the doc string.
Ah, didn't see that hwasan was using DumpProcessMap with this option. I went ahead and changed memprof to do the same thing.

I tried to update the description of that flag, but it is a little hard because there are inconsistencies. E.g. HWAsan always calls DumpProcessMap, which is implemented for all OSes AFAICT. Asan, on the other hand, always calls PrintModuleMap, which is only implemented for Mac. It isn't clear to me how different PrintModuleMap will behave from DumpProcessMap on a Mac (which will pick up the posix version of DumpProcessMap). The implementations are different, but do they provide fundamentally different info? Should Asan be changed to be like HWAsan and call DumpProcessMap? In that case PrintModuleMap will have no callers. Another possibility is to change DumpProcessMap so that for SANITIZER_MAC it calls PrintModuleMap instead, then change the Asan callsites to DumpProcessMap. WDYT?

There were also 2 other places in memprof that were calling PrintModuleMap because the code was cloned from Asan (on error reports when print_module_map >= 2 and on death when print_module_map=1). HWAsan calls DumpProcessMap in both of those cases. I've gone ahead and changed memprof to also call DumpProcessMap in those cases since it is more widely implemented.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87120/new/

https://reviews.llvm.org/D87120