[Parallel_libs-commits] [PATCH] D23211: [StreamExecutor] Add DeviceMemory and kernel arg packing

Fri Aug 5 12:53:40 PDT 2016

jhen added inline comments.

================
Comment at: streamexecutor/include/streamexecutor/DeviceMemory.h:82
@@ +81,3 @@
+  /// device virtual address space, but is platform-dependent.
+  const void *getHandle() const { return Handle; }
+
----------------
jlebar wrote:
> Should we have a non-const version of this too?  (Not sure if that makes sense.)
Since the handle is meant to be a pointer to opaque data, I don't think it makes sense to return a non-const pointer.

================
Comment at: streamexecutor/include/streamexecutor/DeviceMemory.h:135
@@ +134,3 @@
+  /// The static method makeFromElementCount is provided for users of this class
+  /// because it's name makes the meaning of the size paramter clear.
+  DeviceMemory(void *Handle, size_t ElementCount)
----------------
jprice wrote:
> s/it's/its/
> s/paramter/parameter/
*jhen hides his face in mortification* :) Thanks for catching these. I'm going to claim I only made the mistake once because the other one was copy-pasted. :)

================
Comment at: streamexecutor/include/streamexecutor/DeviceMemory.h:137
@@ +136,3 @@
+  DeviceMemory(void *Handle, size_t ElementCount)
+      : DeviceMemoryBase(Handle, ElementCount * sizeof(ElemT)) {}
+};
----------------
jprice wrote:
> The DeviceMemoryBase constructor has its Handle parameter qualified as `const` - is there a reason why this one isn't? Same for the `makeFromElementCount()` function that calls this.
Thanks for catching this. It was just an oversight. Now I have changed them all to be const.

================
Comment at: streamexecutor/include/streamexecutor/DeviceMemory.h:161
@@ +160,3 @@
+template <typename ElemT>
+class SharedDeviceMemory : public DeviceMemory<ElemT> {
+public:
----------------
jlebar wrote:
> Some high-level questions about this API:
> 
> * Why does SharedDeviceMemory inherit from DeviceMemoryBase, when its handle is always null?  It seems like SharedDeviceMemory is inherently a different thing because it never has a handle.
> 
> * Do we need to allow null handles on DeviceMemoryBase (other than for shared memory)?  In general if we can reduce the number of states in our state machine, that makes life easier for everyone.
> 
> * Same for size-zero device/shared memory objects.
> 
> * It seems inconsistent that we allow untyped device-memory handles, but do not allow untyped shared-memory handles.  Why is that?
> Why does SharedDeviceMemory inherit from DeviceMemoryBase, when its handle is always null? It seems like SharedDeviceMemory is inherently a different thing because it never has a handle.

> It seems inconsistent that we allow untyped device-memory handles, but do not allow untyped shared-memory handles. Why is that?

Good questions. The design was a bit sloppy. Now I have cleaned it up so that there are "global" and "shared" device memory types, where each has a base and a typed version, and the shared version does not inherit from the global version.

> Do we need to allow null handles on DeviceMemoryBase (other than for shared memory)? In general if we can reduce the number of states in our state machine, that makes life easier for everyone.

I think we should allow them because folks might want to pass a null-like device pointer argument to their kernel (for all the reasons that people allow passing null pointers to functions in host code).

> Same for size-zero device/shared memory objects.

I think we want to allow these too for cases when folks want to write general kernels that can operate with or without a scratch space, etc.

================
Comment at: streamexecutor/include/streamexecutor/PackedKernelArgumentArray.h:32
@@ +31,3 @@
+/// stored as a struct of arrays rather than an array of structs because CUDA
+/// kernel launches take an array of argument addresses. Having created the
+/// array of argument addresses here, no further work will need to be done in
----------------
jlebar wrote:
> > CUDA kernel launches take an array of argument addresses
> 
> The code in CGNVCUDARuntime::emitDeviceStubBody doesn't look like this, but maybe this is correct from the perspective of the CUDA driver API.  It may be worth clarifying, as presumably people will be more familiar with the runtime API.
Right, my comment refers only to the CUDA driver API. I added that fact to the comment.

================
Comment at: streamexecutor/lib/unittests/PackedKernelArgumentArrayTest.cpp:67
@@ +66,3 @@
+  EXPECT_EQ(ExpectedSize, Observed.getSize(Index)) << "Index: " << Index;
+  EXPECT_EQ(ExpectedType, Observed.getType(Index)) << "Index: " << Index;
+}
----------------
jlebar wrote:
> You could use SCOPED_TRACE, maybe it's a bit cleaner.
Unfortunately I couldn't find a nice way to pass it a string with an integer like ("Index = " << Index). However, as long as the index information is in the message, I think it's OK if it's a little harder to read.

================
Comment at: streamexecutor/lib/unittests/PackedKernelArgumentArrayTest.cpp:169
@@ +168,3 @@
+}
+
+} // namespace
----------------
jlebar wrote:
> It looks like we don't exercise all of the functions in the pack, particularly the ones to get the raw arrays (maybe others too, I didn't look hard).
Glad you mentioned it. It was just the raw array getters that were missing tests, but when I added those tests in, I found some const-correctness issues. Now the tests are in and the bugs are fixed.

https://reviews.llvm.org/D23211