[Parallel_libs-commits] [PATCH] D24353: [SE] RegisteredHostMemory for async device copies

Justin Lebar via Parallel_libs-commits parallel_libs-commits at lists.llvm.org
Fri Sep 9 15:50:13 PDT 2016


jlebar added inline comments.

================
Comment at: streamexecutor/include/streamexecutor/HostMemory.h:45
@@ +44,3 @@
+      : MutableArrayRef(const_cast<ElemT *>(Registered.getPointer()),
+                        Registered.getElementCount()) {}
+
----------------
I'm not sure that a const_cast is the right thing to do.  Instead, could we have two constructors, one which takes a const RHM<ElemT> and one which takes a non-const RHM<ElemT>?

Also, how does this work when ElemT = "const int" and we get an RHM<ElemT>?  I would expect you need some const-decay here.

================
Comment at: streamexecutor/include/streamexecutor/HostMemory.h:45
@@ +44,3 @@
+      : MutableArrayRef(const_cast<ElemT *>(Registered.getPointer()),
+                        Registered.getElementCount()) {}
+
----------------
jlebar wrote:
> I'm not sure that a const_cast is the right thing to do.  Instead, could we have two constructors, one which takes a const RHM<ElemT> and one which takes a non-const RHM<ElemT>?
> 
> Also, how does this work when ElemT = "const int" and we get an RHM<ElemT>?  I would expect you need some const-decay here.
If we go with this approach, we should make sure we can convert from RHMSlice<const T> to RHMSlice<T>, which I don't think works atm.

================
Comment at: streamexecutor/include/streamexecutor/HostMemory.h:48
@@ +47,3 @@
+  ElemT *getPointer() { return MutableArrayRef.data(); }
+  const ElemT *getPointer() const { return MutableArrayRef.data(); }
+  size_t getElementCount() const { return MutableArrayRef.size(); }
----------------
Hmmm...

In a vacuum, I think we'd want just one overload,

  ElemT *getPointer() const;

This matches what C++17's std::slice (nee array_view) does.  And it sort of makes sense: We already have constness in the template type to determine whether or not we can modify the underlying ElemTs.

(By the way, if we stick with this approach, we should explain in a comment about putting const T in the template param.)

But I see that we have the two overloads on RHM, and there they make more sense.  So...I am not sure what we should do here.

One option is we could also make RHM unconditionally give out non-const pointers (and mutable slices).  That would make it behave like unique_ptr -- if I have a const unique_ptr<int[]>, the get() function still gives me a non-const pointer back.

I guess one argument against merging the slice classes is that it seems like we're having difficulty keeping them straight within Stream.h.  So I think I'm OK splitting them back out for now.  It shouldn't complicate the Stream interface any.

================
Comment at: streamexecutor/include/streamexecutor/HostMemory.h:99
@@ +98,3 @@
+                                   "RegisteredHostMemoryBase with a null "
+                                   "platform device");
+  }
----------------
Nit, clang-format is being conservative and not re-merging strings, but this should probably be reflowed.

================
Comment at: streamexecutor/include/streamexecutor/Stream.h:172
@@ +171,3 @@
+  template <typename SrcTy, typename DstTy>
+  Stream &thenCopyD2H(SrcTy &&Src, DstTy &&Dst, size_t ElementCount) {
+    using SrcElemTy = typename std::remove_reference<SrcTy>::type::ElementTy;
----------------
Is it worth merging this with the overload that explicitly takes slices, or do you think that would be too confusing?

================
Comment at: streamexecutor/include/streamexecutor/Stream.h:178
@@ +177,3 @@
+    GlobalDeviceMemorySlice<SrcElemTy> SrcSlice(Src);
+    RegisteredHostMemorySlice<DstElemTy> DstSlice(Dst);
+    return thenCopyD2H(SrcSlice, DstSlice, ElementCount);
----------------
There should be const in some of these?


https://reviews.llvm.org/D24353





More information about the Parallel_libs-commits mailing list