[PATCH] D87120: [HeapProf] Heap profiling runtime support

Thu Sep 10 22:29:10 PDT 2020

tejohnson added inline comments.

================
Comment at: compiler-rt/lib/heapprof/heapprof_allocator.cpp:689
+      return reinterpret_cast<HeapprofChunk *>(alloc_magic[1]);
+    return reinterpret_cast<HeapprofChunk *>(alloc_beg);
+  }
----------------
vitalybuka wrote:
> tejohnson wrote:
> > vitalybuka wrote:
> > > tejohnson wrote:
> > > > vitalybuka wrote:
> > > > > AsanChunk is used in such way that any garbage works there.
> > > > > Here we return alloc_beg but we have no prove that it's initialized. 
> > > > > There is a data race by design, I guess, when lsan can start using that data when it's not yet fully initialized.
> > > > > lsan has sanity checks for all fields so it should not crash/hang and in the worst case it just misses memory leaks. 
> > > > > However If block is alive for user or it's in quarantine then chunk is valid and lsan will do correct leak checking.
> > > > > I can imagine that author tried to avoid locking between lsan and fast thread-local allocator cache.
> > > > > 
> > > > > Is it true for heapprof?
> > > > > Looks like heapprof_mz_size can get here. Can valid, data race free, program get into GetHeapprofChunk in one thread for a memory block which is being allocated/deallocated by another thread? 
> > > > > 
> > > > > BTW. Here is my another Asan patch to avoid metadata and more importantly make sure header is in consistent state D87359
> > > > > 
> > > > The user_requested_size is set to 0 on deallocation (min size is 1 per Allocate). All of the callers of this function do specifically check for a 0 user_requested_size and handle it specially.
> > > If block is allocated from primary allocator, there is not guaranty that it's zeroed. E.g. allocator may uses that mem to maintain free list.
> > > So user_requested_size may not be 0 just after return from allocator.Allocate
> > > If HeapprofChunk is alloc_beg then we have no kAllocBegMagic and GetHeapprofChunk will return this not yet initialized HeapprofChunk.
> > > 
> > > For the secondary it's not a problem as it's from mmap so it's zero-ed.
> > > For the primary with Magic also should not be a problem (if we fix Allocate to set Magic after initialization)
> > How does asan ensure that chunk_state is CHUNK_INVALID (0) on allocation from the primary allocator?
> Unfortunately It does not.
> If we see kAllocBegMagic than we quite sure that it's already updated AsanChunk.
> As I wrote, without kAllocBegMagic (for smallest allocations) we have no idea if this real chunk state which is in the beginning of allocated block, or it's random garbage from primary allocator. D87359 was attempt to always have kAllocBegMagic to avoid that.
> But it's not important for Asan. Asan theoretically should work with any random bytes in AsanChunk.
> 
> 
> 
> If we see kAllocBegMagic than we quite sure that it's already updated AsanChunk.

Is that true though? In asan_allocator.cpp's Allocate(), the first uptr is set to kAllocBegMagic before the chunk header is completely updated. E.g. the user_requested_size and some other fields are not yet initialized. 

In any case, presumably the same issue could occur for asan_mz_size if there are random bytes in AsanChunk. But back to your original question: Can valid, data race free, program get into Get{Heapprof,Asan}Chunk in one thread for a memory block which is being allocated/deallocated by another thread? For an interface like asan_mz_size or heapprof_mz_size, I don't think that would be true.

The other place where I'm calling GetHeapprofChunk here is for FinishAndPrint where we are iterating all chunks. But that is called on Allocator destruction. I do plan to add an interface to dump the profile in a similar manner during the runtime, so presumably there we could invoke GetHeapprofChunk on one that is mid allocation or destruction in another thread. But in the rare case that this hits such a race at the worst we would end up with some corruption in the profile data. We also lock the Allocator for the full chunk iteration during that profile dumping which should minimize this risk.

================
Comment at: compiler-rt/lib/heapprof/heapprof_allocator.cpp:78-95
+struct ChunkHeader {
+  // 1-st 4 bytes.
+  u32 from_memalign : 1;
+  // This field is used for small sizes. For large sizes it is equal to
+  // SizeClassMap::kMaxSize and the actual size is stored in the
+  // SecondaryAllocator's metadata.
+  u32 user_requested_size : 29;
----------------
tejohnson wrote:
> vitalybuka wrote:
> > if you have dummy you can use them for user_requested_size and avoid special case for SizeClassMap::kMaxSize
> > 
> > you probably can steal 32bit from alloc_context_id and use it as offset
> > this way you don't need metadata
> > so if it's 0 then chunk starts with header, otherwise += offset bytes.
> > 
> > If I understand offset is a function of (alloc_beg, alignment)
> > if alignment is 2^x then just one byte for  alignmentLog2 is enough
> > 
> > 
> Thanks for the suggestions. You are right, I can increase the allocation size here given the extra space I currently have. Question on that further below.
> 
> And good suggestion about alloc_context_id, I see StackDepotPut returns a u32 anyway. I'm not sure why I changed this to u64 here - looks like it has been u32 in asan.
Actually, I don't believe I need to encode the offset. Just using a bit like asan to flag from_memalign should be sufficient. The offset I added was only needed in AllocBeg(), but I can simply use the same technique as asan here: if from_memalign then use the base allocator's GetBlockBegin mechanism to get from the header to the alloc begin. (To go from the alloc begin to the header we simply check the first uptr to see if it is kAllocBegMagic.)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87120/new/

https://reviews.llvm.org/D87120