[Parallel_libs-commits] [PATCH] D24353: [SE] RegisteredHostMemory for async device copies

Jason Henline via Parallel_libs-commits parallel_libs-commits at lists.llvm.org
Fri Sep 9 13:06:45 PDT 2016


jhen added a comment.

I really like the idea of using completely templated Src and Dst arguments. I've implemented that in this patch, and I think everyone can agree that it really cleaned up the code.

I'm not 100% sure that combining the mutable and immutable host slices in one class is the right choice, but I've also made that change so we can take a look and see what folks think.


================
Comment at: streamexecutor/include/streamexecutor/HostMemory.h:33
@@ +32,3 @@
+/// An immutable slice of registered host memory.
+///
+/// Holds a reference to an underlying registered host memory buffer. Must not
----------------
I added comments to each of these classes to say the memory is registered in the sense of `Device::registerHostMemory`.

================
Comment at: streamexecutor/include/streamexecutor/HostMemory.h:161
@@ +160,3 @@
+  ElemT *getPointer() { return static_cast<ElemT *>(Pointer); }
+  size_t getElementCount() const { return ByteCount / sizeof(ElemT); }
+
----------------
jlebar wrote:
> It doesn't matter much, but this base class is kind of annoying, partially because we have the added overhead of this division (shift) every time we get the size, but more because it's another class kicking around.
> 
> It seems like the main reason we need the base class is to make the out-of-line destructor work without including device.h here.  But we can do something similar without all the fuss of a base class.  We could define an out-of-line function in a "detail" namespace, or we could have a base class with just one function, defined out-of-line, that we call from our destructor.
I made a function called `internal::destroyRegisteredHostMemoryInternals` and got rid of the base class.


https://reviews.llvm.org/D24353





More information about the Parallel_libs-commits mailing list