[PATCH] D39244: [sanitizer] Random shuffling of chunks for the 32-bit Primary Allocator

Tue Oct 24 16:02:39 PDT 2017

cryptoad added inline comments.

================
Comment at: lib/sanitizer_common/sanitizer_allocator_primary32.h:341
+    const uptr kShuffleArraySize = 48;
+    uptr shuffle_array[kShuffleArraySize];
+    uptr j = 0;
----------------
alekseyshl wrote:
> cryptoad wrote:
> > alekseyshl wrote:
> > > cryptoad wrote:
> > > > cryptoad wrote:
> > > > > alekseyshl wrote:
> > > > > > How about using the newly allocated region for the array? You can fill and shuffle all the pointers at once, create the batches and then zero reg out. Still two passes, but less stack and chunks will be farther apart. The code will be simpler too and I have a feeling that we can even avoid double copying for !kShuffleArraySize.
> > > > > This is tricky because if kRandomShuffleChunks is not used with kUseSeparateSizeClassForBatch , then the batches can also use the region itself.
> > > > > Meaning while we iterate through the chunks, we could create a batch that would overlap and corrupt the array we are iterating on.
> > > > > We can't really assume any part of the region is "safe" to store the array either due to the randomness, so using the end of the region or the beginning doesn't change that.
> > > > > If kUseSeparateSizeClassForBatch  is used, then we are golden and that can work.
> > > > > Currently Scudo uses both, but I feel they should be treated distinctly nonetheless.
> > > > Rectification: even with kUseSeparateSizeClassforBatch, there is the special case of kBatchClassID for which the batches are stored in the same region.
> > > Doesn't this latter case create a loop? We're in PopulateFreeList because AllocateBatch figured that sci->free_list.empty(), and when PopulateFreeList calls CreateBatch, don't we get back to AllocateBatch with free list still empty?
> > > 
> > > Sorry for too many questions, just want to understand what's going on here.
> > No worries, all questions are welcome!
> > 
> > So the first PopulateFreeList will be with the class_id that we requested with malloc, then it will go down the again in PopulateFreeList except that this time the class_id will be kBatchClassId, which is serviced from its own region (eg: it doesn't go down into PopulateFreeList again).
> > So for the first invocation (where it needs to create a region for the size requested and then for the batch), the stack looks like:
> > ```
> > #0  __sanitizer::SizeClassAllocator32LocalCache<__sanitizer::SizeClassAllocator32<__scudo::AP32> >::CreateBatch
> > #1  0x00000000005095f1 in __sanitizer::SizeClassAllocator32<__scudo::AP32>::PopulateBatches
> > #2  0x0000000000509123 in __sanitizer::SizeClassAllocator32<__scudo::AP32>::PopulateFreeList
> > #3  0x0000000000508c4e in __sanitizer::SizeClassAllocator32<__scudo::AP32>::AllocateBatch
> > #4  0x0000000000508a26 in __sanitizer::SizeClassAllocator32LocalCache<__sanitizer::SizeClassAllocator32<__scudo::AP32> >::Refill
> > #5  0x0000000000508942 in __sanitizer::SizeClassAllocator32LocalCache<__sanitizer::SizeClassAllocator32<__scudo::AP32> >::Allocate
> > #6  0x00000000005085f2 in __sanitizer::SizeClassAllocator32LocalCache<__sanitizer::SizeClassAllocator32<__scudo::AP32> >::CreateBatch
> > #7  0x00000000005095f1 in __sanitizer::SizeClassAllocator32<__scudo::AP32>::PopulateBatches
> > #8  0x0000000000509123 in __sanitizer::SizeClassAllocator32<__scudo::AP32>::PopulateFreeList
> > #9  0x0000000000508c4e in __sanitizer::SizeClassAllocator32<__scudo::AP32>::AllocateBatch
> > #10 0x0000000000508a26 in __sanitizer::SizeClassAllocator32LocalCache<__sanitizer::SizeClassAllocator32<__scudo::AP32> >::Refill
> > #11 0x0000000000508942 in __sanitizer::SizeClassAllocator32LocalCache<__sanitizer::SizeClassAllocator32<__scudo::AP32> >::Allocate
> > ```
> > Frames 11 to 6 are wrt to the requested class_id, 5 to 0 are for kBatchClassId.
> > When reaching 0, we just return b and everything unrolls back up.
> > This is only for the first allocation, afterwards the Region for the TransferBatch is created and the associated free list is populated so we won't go there again unless we need to re-populate them.
> > 
> Thanks for the stack!
> 
> So, you're saying it never happens, we never allocate batches from the same bucket (and thus, the same region)? Then why my proposal of using the new region for the temp buffer is not viable? Otherwise, if I got your explanation wrong and there are cases when batches are allocated from the same region (the region we're populating free list for), how come it does not get into a loop I mentioned earlier? I am a bit confused...
> 
> 
I am going to try and take a step back to make sure I am on the same wave length as you.

The proposal as I understand it is to put all the possible pointers in the newly allocated region, and shuffle them, there, thus allowing for more than a 48 shuffle at a time. Then create and fill batches by iterating through the shuffled array, potentially zero out the region.
If this is not the actual proposal, please let me know.

With the proposal, there are 2 cases to take into account: kUseSeparateSizeClassforBatch is true or false.

If kUseSeparateSizeClassforBatch is false, then if a chunk in a class is large enough to hold a transfer batch, it will be used as a TranferBatch.
https://github.com/llvm-mirror/compiler-rt/blob/master/lib/sanitizer_common/sanitizer_allocator_local_cache.h#L146
Meaning that when we do a b->Add(xxx) we actually overwrite some of region memory itself.
In this case, it sounds like it would be tough to not overlap an in-region transfer batch with the pointers we are iterating through.
As such, I am not sure the proposal is viable (or at least without significant code complexity).

If kUseSeparateSizeClassforBatch is true, then we allocate transfer batches from a separate class size, so we could reuse the region for all other classes except that one. Because for that particular one, we fall back into an equivalent of the preceding case: batches for the kBatchClassId class are allocated within the class region itself. Now we could make this work by ensuring there is no randomization in that specific kBatchClassId class: indeed we do not need batches to be randomized.

A couple of other points to consider:
- For class 1 (16 bytes) with a 1MB region, we get 65536 chunks in the region. If we are using the region to hold the array, with 64-bit pointers, it's 524288 bytes that we are dirtying in the region (as well as the transfer batch region really);
- Zeroing that memory (to avoid potential disclosure of valid pointers) will be somewhat costly as well.

By using a local array, we reduce the randomness of our chunk shuffle (at most 48 consecutive chunks get shuffled at a time), but we do not touch the newly allocated region (yet) and save on some complications.

I hope this clarifies my point of view (if my understanding of what you are suggesting is correct).

https://reviews.llvm.org/D39244