[Parallel_libs-commits] [PATCH] D23577: [StreamExecutor] Executor add synchronous methods
Jason Henline via Parallel_libs-commits
parallel_libs-commits at lists.llvm.org
Fri Aug 19 12:20:53 PDT 2016
jhen added inline comments.
================
Comment at: streamexecutor/include/streamexecutor/DeviceMemory.h:110
@@ +109,3 @@
+ size_t getElementOffset() const { return ElementOffset; }
+ size_t getElementCount() const { return ElementCount; }
+
----------------
jlebar wrote:
> I think getOffset() and getCount() might be better names, but we should definitely be consistent with the names in GlobalDeviceMemory, as you've done here.
I renamed them to `getOffset()` and `getCount()`. I also renamed the `GlobalDeviceMemory<T>` method to `getCount()` for consistency.
================
Comment at: streamexecutor/include/streamexecutor/DeviceMemory.h:116
@@ +115,3 @@
+ BaseArray, ElementOffset + DropCount,
+ (DropCount > ElementCount) ? 0 : ElementCount - DropCount);
+ }
----------------
jlebar wrote:
> We should think about what exactly we want to be an error in these functions. I'm not sure that silently accepting out-of-range values makes sense.
ArrayRef uses assert, so I did that here too.
================
Comment at: streamexecutor/include/streamexecutor/DeviceMemory.h:192
@@ +191,3 @@
+ std::min(getElementCount(), TakeCount));
+ }
+
----------------
jlebar wrote:
> We could avoid having these functions entirely by having an asSlice() function, but I'm not sure that's better than this API. However, it would be nice not to reimplement these tricky functions. Could we just do e.g.
>
> return DeviceArraySlice<ElemT>(*this).drop_front(DropCount);
>
> ?
`asSlice()` sounds good to me. I put that in and removed all of these.
================
Comment at: streamexecutor/include/streamexecutor/Executor.h:156
@@ +155,3 @@
+ return synchronousCopyD2H(Src.drop_front(0), Dst);
+ }
+
----------------
jlebar wrote:
> My thought is that we'd have an implicit one-arg constructor in DeviceArraySlice<T> so we wouldn't need this overload.
>
> DeviceArraySlice<T>::DeviceArraySlice<T>(const GlobalDeviceMemory<T>& Mem);
>
> (In the same way, llvm::ArrayRef has an implicit one-arg constructor that accepts an std::vector<T>.)
I tried this and unfortunately it doesn't seem to work perfectly. I think the template type inference is getting in the way of selecting the implicit constructor. For example, I have to call `Executor.syncCopyD2H<int>(DeviceA, llvm::MutableArray<int>(Host))` in my test code, where I explicitly specify the function template argument for the `syncCopyD2H` function. Otherwise, I get a compile error that says it ignored the candidate template because it could not match `GlobalDeviceMemorySlice` agains `GlobalDeviceMemory`.
I removed the `syncCopyD2H` overloads for `GlobalDeviceMemory` for now, so that users have to explicitly supply the template argument or call `asSlice()`. Should I add back the overloads in order to get around this problem?
https://reviews.llvm.org/D23577
More information about the Parallel_libs-commits
mailing list