[PATCH] D45756: [XRay][profiler] Part 1: XRay Allocator and Array Implementations

Dean Michael Berris via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Apr 19 23:38:20 PDT 2018


dberris added a comment.

Thank you for the thorough review, @kpw!



================
Comment at: compiler-rt/lib/xray/xray_segmented_array.h:34-35
+/// number from packing as many T's as possible in a single cache line.
+template <class T, size_t N = (kCacheLineSize / sizeof(T)) +
+                              (kCacheLineSize % sizeof(T) ? 1 : 0)>
+struct Array {
----------------
kpw wrote:
> For the N default value, you can't pack an entire T if the remaining bytes are non-zero. I think it can be simplified to kCacheLineSize / sizeof(T)
Here's the problem with that -- if T > kCacheLineSize, then N must not be 0.

Essentially we want to figure out how many T's will fit in blocks that are cache-line size *multiples*.  The computation I wanted to express is something like this:

Given sizeof(T), how many T's can I fit in cache-line _multiple_ sized blocks if each cache line is kCacheLineSize bytes?

For a cache line of 64 bytes, and T being 16 bytes for example, it seems that we can fit 4 T's just fine. But for something with 24 bytes, we can only fit 2 in one cache line, but if we asked for 8 then we can put them in three cache lines. It might not be great without alignment information (i.e. we really should be treating the 24-byte object as a 32-byte object) but that's something that can be fixed by the definition of T.

Thinking this through, we really should be using the least-common multiple of sizeof(T) and kCacheLineSize, divide that by sizeof(T)  to figure out how many elements to store in cache-line-multiple-sized blocks for the default N.

I've updated the implementation to do just that. :)


================
Comment at: compiler-rt/lib/xray/xray_segmented_array.h:46-47
+    static constexpr size_t Size = N;
+    Chunk *Prev = nullptr;
+    Chunk *Next = nullptr;
+  };
----------------
kpw wrote:
> Did you consider the tradeoffs of these being external to Block versus invasive to the Block?
I did think about using the Block storage as an opaque set of bytes and intrusively putting the pointers in there too (ala an intrusive list). I *think* that makes the implementation much more bug prone and it made my head hurt just trying to keep the aliasing and alignment rules in my head of where the pointers should be, whether the addresses we're getting are properly aligned, etc. -- so I punted and left it this way instead. :)

Putting a TODO to explore that possibility in the future.


================
Comment at: compiler-rt/lib/xray/xray_segmented_array.h:125-131
+      if (--Offset % N) {
+        return *this;
+      }
+
+      // At this point, we know that Offset % N == 0, so we must retreat the
+      // chunk pointer.
+      DCHECK_EQ(Offset % N, 0);
----------------
kpw wrote:
> This doesn't seem right. There is still a value at offset 0 within the current chunk when going backward. What you want to check is whether Offset ends up being in the N - 1 congruence class after decrement.
Oh my, yes you're right! Good catch, thanks.


================
Comment at: compiler-rt/lib/xray/xray_segmented_array.h:153
+    friend bool operator==(const Iterator &L, const Iterator &R) {
+      return L.C == R.C && L.Offset == R.Offset;
+    }
----------------
kpw wrote:
> This is fine., but will a compiler generate memcmp equivalent?
I don't think `memcmp` is actually what we want, but I'm happy for the compiler to do it for me if it can.


https://reviews.llvm.org/D45756





More information about the llvm-commits mailing list