[compiler-rt] [scudo] Move the chunk update into functions (PR #83493)

Mon Apr 15 13:55:33 PDT 2024

================
@@ -1148,10 +1035,176 @@ class Allocator {
            reinterpret_cast<uptr>(Ptr) - SizeOrUnusedBytes;
   }
 
+  ALWAYS_INLINE void *initChunk(const uptr ClassId, const Chunk::Origin Origin,
+                                void *Block, const uptr UserPtr,
+                                const uptr SizeOrUnusedBytes,
+                                const FillContentsMode FillContents) {
+    Block = addHeaderTag(Block);
+    // Only do content fill when it's from primary allocator because secondary
+    // allocator has filled the content.
+    if (ClassId != 0 && UNLIKELY(FillContents != NoFill)) {
+      // This condition is not necessarily unlikely, but since memset is
+      // costly, we might as well mark it as such.
+      memset(Block, FillContents == ZeroFill ? 0 : PatternFillByte,
+             PrimaryT::getSizeByClassId(ClassId));
+    }
+
+    Chunk::UnpackedHeader Header = {};
+
+    const uptr DefaultAlignedPtr =
+        reinterpret_cast<uptr>(Block) + Chunk::getHeaderSize();
+    if (UNLIKELY(DefaultAlignedPtr != UserPtr)) {
+      const uptr Offset = UserPtr - DefaultAlignedPtr;
+      DCHECK_GE(Offset, 2 * sizeof(u32));
+      // The BlockMarker has no security purpose, but is specifically meant for
+      // the chunk iteration function that can be used in debugging situations.
+      // It is the only situation where we have to locate the start of a chunk
+      // based on its block address.
+      reinterpret_cast<u32 *>(Block)[0] = BlockMarker;
+      reinterpret_cast<u32 *>(Block)[1] = static_cast<u32>(Offset);
+      Header.Offset = (Offset >> MinAlignmentLog) & Chunk::OffsetMask;
+    }
+
+    Header.ClassId = ClassId & Chunk::ClassIdMask;
+    Header.State = Chunk::State::Allocated;
+    Header.OriginOrWasZeroed = Origin & Chunk::OriginMask;
+    Header.SizeOrUnusedBytes = SizeOrUnusedBytes & Chunk::SizeOrUnusedBytesMask;
+    Chunk::storeHeader(Cookie, reinterpret_cast<void *>(addHeaderTag(UserPtr)),
+                       &Header);
+
+    return reinterpret_cast<void *>(UserPtr);
+  }
+
+  NOINLINE void *
+  initChunkWithMemoryTagging(const uptr ClassId, const Chunk::Origin Origin,
+                             void *Block, const uptr UserPtr, const uptr Size,
+                             const uptr SizeOrUnusedBytes,
+                             const FillContentsMode FillContents) {
+    const Options Options = Primary.Options.load();
+    DCHECK(useMemoryTagging<Config>(Options));
+
+    void *Ptr = reinterpret_cast<void *>(UserPtr);
+    void *TaggedPtr = Ptr;
+
+    if (LIKELY(ClassId)) {
+      // We only need to zero or tag the contents for Primary backed
----------------
cferris1000 wrote:

While reviewing this cl, trying to work through a three page long function has been painful. Breaking it up into smaller, easier to understand pieces really helps understand what's going on, because I only need to understand what each function does. For example, with a large function, variable names tend to be reused, so it can get confusing trying to figure out where the variable was set and what it means at page one versus page two or page three. You can fix this by never re-using the variable names, but at that point you are emulating breaking the function into smaller functions and might as well go all the way and refactor.

The advice to break functions is almost universal, but I have seen it go too far the other way. Adding a function that has two lines in it and is only used once, that doesn't make sense and should be folded into the caller. I don't think this change is doing that.

Now, if you don't think that this refactor is breaking up the function into easy to understand pieces, that would be a reason to keep modifying this. For example, if you don't think the comments make sense, or that the function partition is right, please mention it.

Otherwise, having had to read this very long function a number of times, I can tell you this new version makes it much easier to understand. Even trying to figure out the MTE enabled code path made it extra difficult to figure out what was happening. Having to go back and forth looking at the modifications done in the MTE enabled path can be confusing. So, this should really help understand the MTE and non-MTE paths alone, which makes this worth doing, in my estimation.

https://github.com/llvm/llvm-project/pull/83493