[compiler-rt] [scudo] Move the chunk update into functions (PR #83493)

Mon Apr 15 15:16:05 PDT 2024

================
@@ -1148,10 +1035,176 @@ class Allocator {
            reinterpret_cast<uptr>(Ptr) - SizeOrUnusedBytes;
   }
 
+  ALWAYS_INLINE void *initChunk(const uptr ClassId, const Chunk::Origin Origin,
+                                void *Block, const uptr UserPtr,
+                                const uptr SizeOrUnusedBytes,
+                                const FillContentsMode FillContents) {
+    Block = addHeaderTag(Block);
+    // Only do content fill when it's from primary allocator because secondary
+    // allocator has filled the content.
+    if (ClassId != 0 && UNLIKELY(FillContents != NoFill)) {
+      // This condition is not necessarily unlikely, but since memset is
+      // costly, we might as well mark it as such.
+      memset(Block, FillContents == ZeroFill ? 0 : PatternFillByte,
+             PrimaryT::getSizeByClassId(ClassId));
+    }
+
+    Chunk::UnpackedHeader Header = {};
+
+    const uptr DefaultAlignedPtr =
+        reinterpret_cast<uptr>(Block) + Chunk::getHeaderSize();
+    if (UNLIKELY(DefaultAlignedPtr != UserPtr)) {
+      const uptr Offset = UserPtr - DefaultAlignedPtr;
+      DCHECK_GE(Offset, 2 * sizeof(u32));
+      // The BlockMarker has no security purpose, but is specifically meant for
+      // the chunk iteration function that can be used in debugging situations.
+      // It is the only situation where we have to locate the start of a chunk
+      // based on its block address.
+      reinterpret_cast<u32 *>(Block)[0] = BlockMarker;
+      reinterpret_cast<u32 *>(Block)[1] = static_cast<u32>(Offset);
+      Header.Offset = (Offset >> MinAlignmentLog) & Chunk::OffsetMask;
+    }
+
+    Header.ClassId = ClassId & Chunk::ClassIdMask;
+    Header.State = Chunk::State::Allocated;
+    Header.OriginOrWasZeroed = Origin & Chunk::OriginMask;
+    Header.SizeOrUnusedBytes = SizeOrUnusedBytes & Chunk::SizeOrUnusedBytesMask;
+    Chunk::storeHeader(Cookie, reinterpret_cast<void *>(addHeaderTag(UserPtr)),
+                       &Header);
+
+    return reinterpret_cast<void *>(UserPtr);
+  }
+
+  NOINLINE void *
+  initChunkWithMemoryTagging(const uptr ClassId, const Chunk::Origin Origin,
+                             void *Block, const uptr UserPtr, const uptr Size,
+                             const uptr SizeOrUnusedBytes,
+                             const FillContentsMode FillContents) {
+    const Options Options = Primary.Options.load();
+    DCHECK(useMemoryTagging<Config>(Options));
+
+    void *Ptr = reinterpret_cast<void *>(UserPtr);
+    void *TaggedPtr = Ptr;
+
+    if (LIKELY(ClassId)) {
+      // We only need to zero or tag the contents for Primary backed
----------------
pcc wrote:

> it can get confusing trying to figure out where the variable was set and what it means at page one versus page two or page three

Yeah, this is one of the things that I would need an IDE/clangd for (find variable definition/references), but the thing is that I would need an IDE for this regardless of whether the function was broken up. And for me, the split into multiple functions would make the situation worse because I need to manually track the variable across function calls (find the matching argument in the caller/callee) as well, since an IDE typically will only find definition/references within the local function. I think the fact that there's a lot of code to handle allocations is just an intrinsic property of the code needing to do a lot of stuff and it's not clear to me that moving it around like this really helps.

> You can fix this by never re-using the variable names

I would personally be happier with a change like this. The code would still be friendly to IDEs and people who want to read the whole function and it would make it more obvious where the value for each variable comes from.

> The advice to break functions is almost universal

Fair enough, I'm not necessarily claiming that my experience is universal, it could just be the way my brain works. If breaking functions really does make the code easiest to read for everyone else, I wouldn't object to it and I'll just put up with it being harder to read for me.

> Now, if you don't think that this refactor is breaking up the function into easy to understand pieces, that would be a reason to keep modifying this. For example, if you don't think the comments make sense, or that the function partition is right, please mention it.

Yeah, my other comments are what I have to offer on that front. But since the question of how to structure code to make it easy to read is subjective and my personal preference would be to either not make this change at all or to make a different change like not re-using variables it's difficult to offer more feedback than that.

https://github.com/llvm/llvm-project/pull/83493