[Openmp-commits] [PATCH] D81054: [OpenMP] Introduce target memory manager

Jon Chesterfield via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Wed Jun 3 08:12:21 PDT 2020


JonChesterfield added a comment.

We definitely want faster memory allocation on the target. This is an interesting piece for that.

This patch implements a memory pool on top of Device->RTL->data_alloc. It's interesting that there's a performance hazard with cuda there. The hsa layer amdgpu builds this on has memory pools around the kernel allocation, so I'm not sure this would be of direct benefit for amdgpu.

Memory allocators are notoriously difficult to implement efficiently in the face of unknown workload. Can you share benchmarks that lead to this design?

Similarly they're really easy to get wrong. There's a lot of subtle arithmetic in this patch. It would be prudent to cover this with tests, e.g. stub out the RTL->data_alloc calls for malloc/free and run targeted and fuzz tests under valgrind.

A second part of this puzzle is device side memory pools, so that malloc/free from a target region doesn't (always) have to call into the host. That may end up being quite platform dependent. That seems orthogonal to this patch.



================
Comment at: openmp/libomptarget/src/memory.cpp:32
+
+// Find the previous number that is power of 2 given a number
+inline size_t flp2(size_t Num) {
----------------
This maps:
0 -> 0
1 -> 1
2 -> 2
3 -> 2
4 -> 4
which is not the previous power of two. Round down to a power of two could be:
`x < 2 ? x : 1 << (31 - __builtin_clz(x - 1))`



================
Comment at: openmp/libomptarget/src/memory.cpp:56
+      L = M;
+  }
+
----------------
Tests for this arithmetic?


================
Comment at: openmp/libomptarget/src/memory.cpp:71
+class MemoryManagerTy {
+  std::list<Node *> FreeList[NumBuckets];
+  std::list<Node> NodeList;
----------------
Why list over smallvector? I can't see a need for iterator stability here


================
Comment at: openmp/libomptarget/src/memory.cpp:94
+    for (int I = NumBuckets; I >= 0; --I) {
+      std::list<Node *> &List = FreeList[I];
+      std::mutex &Lock = FreeListLocks[I];
----------------
This is quadratic - each pass around the loop walks through each node of the list


================
Comment at: openmp/libomptarget/src/memory.cpp:111
+        // Cannot free memory on device
+        // TODO: Maybe should raise an expcetion?
+        if (Ret != OFFLOAD_SUCCESS)
----------------
LLVM is built with exceptions disabled, so probably shouldn't raise here


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81054/new/

https://reviews.llvm.org/D81054





More information about the Openmp-commits mailing list