[Parallel_libs-commits] [PATCH] D23577: [StreamExecutor] Executor add synchronous methods

Tue Aug 16 15:11:32 PDT 2016

jlebar added inline comments.

================
Comment at: streamexecutor/include/streamexecutor/Executor.h:48
@@ +47,3 @@
+  /// Frees memory previously allocated with allocateDeviceMemory.
+  template <typename T> Error freeDeviceMemory(GlobalDeviceMemory<T> *Memory) {
+    return PExecutor->freeDeviceMemory(Memory);
----------------
> Do you think it would be better to pass DeviceMemory<T>& for T* and const DeviceMemory<T>& for const T*, or something like that?

Yes, I think so, since a DeviceMemory<T> is essentially a T*.  Or put another way, when you allocate host memory, you get a T*, and when you allocate device memory, you get a DeviceMemory<T>.

================
Comment at: streamexecutor/include/streamexecutor/Executor.h:109
@@ +108,3 @@
+  Error synchronousMemcpyD2H(const GlobalDeviceMemory<T> &DeviceSrc,
+                             llvm::MutableArrayRef<T> HostDst) {
+    return synchronousMemcpyD2H(DeviceSrc, HostDst,
----------------
Hm, maybe we should match the arg order of memcpy, given that we're using that name.

================
Comment at: streamexecutor/include/streamexecutor/Executor.h:110
@@ +109,3 @@
+                             llvm::MutableArrayRef<T> HostDst) {
+    return synchronousMemcpyD2H(DeviceSrc, HostDst,
+                                DeviceSrc.getElementCount());
----------------
> To me that feels too restrictive.

I guess it's a question of what we're optimizing for.

It seems to me that much of the time, the dst will be of the same size as the src.  In such cases, you'd want to know if you accidentally mismatched the sizes, because that's going to leave part of your dst uninitialized.

If you really want to copy into only part of dst, you can make that explicit by creating a new array slice:

  vector<int> dst(100);
  synchronousMemcpyD2H(src, makeMutableArrayRef(dst.data(), DeviceSrc.getElementCount()));

(makeMutableArrayRef doesn't exist at the moment, but that's easily fixed...)

Or alternatively, you could just call the other overload:

  vector<int> dst(100);
  synchronousMemcpyD2H(src, dst, DeviceSrc.getElementCount());

Both cases are explicit that we're doing something other than simply copying everything from dst to src.

https://reviews.llvm.org/D23577