[Parallel_libs-commits] [PATCH] D25701: Initial check-in of Acxxel (StreamExecutor renamed)

Justin Lebar via Parallel_libs-commits parallel_libs-commits at lists.llvm.org
Mon Oct 17 17:22:16 PDT 2016


jlebar added a comment.

Got through the header file, now my eyes are starting to glaze over.  Will come back to this tomorrow.  This looks really good, though.

The one thing I really miss from the old SE is not checking the error at every line.  I wonder if we could say that errors carry forward just like they used to?  Or maybe they do actually carry forward and I don't need to have an error check on every line -- I haven't gotten to the implementation yet.  :)



================
Comment at: acxxel/acxxel.h:57
+///
+/// Acxxel functions as a drop-in replacement for the standard CUDA runtime
+/// library and interoperates seamlessly with kernel calls.
----------------
Is this true?  Like Acxxel exposes a different interface from the standard CUDA runtime library, so maybe it's not a "drop-in" replacement.  But maybe it's a "replacement", and maybe we should say it doesn't require libcuda.


================
Comment at: acxxel/acxxel.h:104
+
+/// Function type used to destroy opaque handles.
+using HandleDestructor = void (*)(void *);
----------------
Maybe we can say, "Function type used to destroy opaque handles given out by the Platform." or something.


================
Comment at: acxxel/acxxel.h:114
+  bool DeviceMap = false;
+  bool IOMemory = false;
+};
----------------
It's not obvious to me what these three mean; maybe they should have comments?


================
Comment at: acxxel/acxxel.h:119
+struct StreamConfig {
+  bool NonBlocking = false;
+};
----------------
You know, I thought I knew what this meant, but now I'm not sure.


================
Comment at: acxxel/acxxel.h:126
+  bool DisableTiming = false;
+  bool Interprocess = false;
+};
----------------
Same for these three.


================
Comment at: acxxel/acxxel.h:148
+
+/// Gets a pointer to the platform.
+///
----------------
Maybe, "Gets a pointer to the platform with the given name" or something, so it's clear that there can be multiple platforms.


================
Comment at: acxxel/acxxel.h:154
+
+/// A stream of computation.
+class Stream {
----------------
Can we expand on this a little bit?  Maybe, "All operations on a Stream are serialized."


================
Comment at: acxxel/acxxel.h:177
+  /// This is useful because the event argument may be recorded on a different
+  /// stream, and this method allows for synchronization between streams without
+  /// synchronizing all streams.
----------------
Nit, s/and/so/ or something.


================
Comment at: acxxel/acxxel.h:193
+  /// There are no guarantees about ordering for work that is enqueued onto the
+  /// stream while this function is executing. That work may run at the same
+  /// time as the callback.
----------------
Which function -- the callback, or addCallback()?  (I am legitimately unsure.)


================
Comment at: acxxel/acxxel.h:195
+  /// time as the callback.
+  template <typename F> Status addCallback(F &&Callback);
+
----------------
Why not just take `std::function<void(Stream&, const Status&)>` here?

That would give us better error messages, and it seems like it would simplify a lot of the code in the cc file.  It's also a potential speedup -- for example, if Callback is a function pointer, creating an std::function from it doesn't necessarily allocate, whereas the implementation below always calls malloc.

Maybe I'm missing something.


================
Comment at: acxxel/acxxel.h:203
+  /// directly by calling Platform::NewAsyncHost.
+  /// \{
+
----------------
Can we be more specific about the allowed types here?  It looks like they need to be convertible to something or other.  Same elsewhere where we take universal references to "memory things".


================
Comment at: acxxel/acxxel.h:259
+
+/// A user-created event on a device.
+class Event {
----------------
Here too it might be helpful to have a bit more verbiage here telling me what this is good for.


================
Comment at: acxxel/acxxel.h:269
+  /// Enqueues the event on a stream.
+  Status record(Stream &Stream);
+
----------------
This strikes me kind of as a weird name; the stream "records" the event?  But I don't have a better suggestion offhand, and if it's consistent with prior art, maybe it's fine.


================
Comment at: acxxel/acxxel.h:279
+  /// event's execution.
+  Expected<float> getMillisecondsSince(const Event &Previous);
+
----------------
If we're giving out floating-point numbers anyway, could we give seconds instead?  That's a more natural unit.


================
Comment at: acxxel/acxxel.h:291
+
+class Program {
+public:
----------------
Should this have a comment?  I can't figure out what this is.


================
Comment at: acxxel/acxxel.h:346
+
+  /// Sets the index of the active device for this platform in this thread.
+  virtual Status setActiveDevice(int DeviceIndex) = 0;
----------------
Maybe s/the index of//.  It's pretty clear from the signature how the number corresponds to a device, whereas as written I'm kind of confused.


================
Comment at: acxxel/acxxel.h:365
+      return MaybePointer.getError();
+    }
+    return DeviceMemory<T>(this, MaybePointer.getValue(), ElementCount,
----------------
llvm/clang's style is to leave off braces here.  Are we adopting a different style?  I don't really care myself, but you'll probably get grumbles.


================
Comment at: acxxel/acxxel.h:370
+
+  /// Creates a DeviceMemorySpan for a device symbol.
+  template <typename ElementType>
----------------
I have no idea what this is doing; can we expand upon this some?


================
Comment at: acxxel/acxxel.h:395
+    using DstElementTy =
+        typename std::remove_reference<DeviceDstTy>::type::value_type;
+    static_assert(std::is_same<SrcElementTy, DstElementTy>::value,
----------------
You only care about the type of `*DeviceSrc.data()` and `*DeviceDst.data()`, right?  Maybe we could just use `decltype` instead of ::value_type, so we'll be compatible with more types.

(If you don't want to type out the decltype expression over and over, I suppose we could even have our own traits class.  But that may be overkill.)


================
Comment at: acxxel/acxxel.h:478
+
+  template <typename T, typename DeviceSrcTy>
+  Status copyDtoPtrH(DeviceSrcTy &&DeviceSrc, T *HostDst,
----------------
Probably worth adding a comment indicating that if DeviceSrc's size is less than ElementCount, we copy only the first ElementCount elems -- I wasn't sure if that would be an error or what.


================
Comment at: acxxel/acxxel.h:480
+  Status copyDtoPtrH(DeviceSrcTy &&DeviceSrc, T *HostDst,
+                     ptrdiff_t ElementCount) {
+    using SrcElementTy =
----------------
Is this better than requiring us to go through a span?  That would shrink the API a bit, and (I would hope) make it more obvious what happens when ElementCount doesn't match DeviceSrc's size.


================
Comment at: acxxel/acxxel.h:538
+  Status copyPtrHtoD(const T *HostSrc, DeviceDstTy &DeviceDst,
+                     ptrdiff_t ElementCount) {
+    using DstElementTy =
----------------
Similar comment here.


================
Comment at: acxxel/acxxel.h:598
+          typename std::remove_reference<decltype(Cont[0])>::type>> {
+    using ValueType = typename std::remove_reference<decltype(Cont[0])>::type;
+    Span<ValueType> Span(Cont);
----------------
Do we actually care about operator[] as opposed to the `data` function?  That's surprising if so...  Perhaps we should explicitly write down our "concepts" somewhere.

At least let's add a comment to the header indicating what "concept" Container needs to fulfill.


================
Comment at: acxxel/acxxel.h:610
+
+  /// Allocates an owned, registered array of objects on the host.
+  template <typename T>
----------------
Since the device one doesn't call T::T() but this one does, maybe we should say that.


================
Comment at: acxxel/acxxel.h:613
+  Expected<OwnedAsyncHostMemory<T>>
+  newAsyncHost(ptrdiff_t ElementCount,
+               AsyncHostMemoryConfig Config = AsyncHostMemoryConfig()) {
----------------
Could we call this newAsyncHostMem?  That would match registerHostMem, and also just be easier to understand, I think.


================
Comment at: acxxel/acxxel.h:627
+  virtual Expected<Program>
+  createProgramFromSource(const std::string &Source) = 0;
+
----------------
Hm, requiring the program to be stored in an std::string may be a problem.  Could we accept a Span<char> instead?


================
Comment at: acxxel/acxxel.h:643
+                             Span<void *> Arguments, Span<size_t> ArgumentSizes,
+                             size_t SharedMemoryBytes = 0) {
+    return rawEnqueueKernelLaunch(TheStream.TheHandle.get(),
----------------
I am unclear how this kernel-launching stuff relates to the comment at the top of file saying we don't do this...  Is this for opencl only?


================
Comment at: acxxel/acxxel.h:713
+  virtual Expected<void *> rawMallocH(ptrdiff_t ByteCount,
+                                      AsyncHostMemoryConfig Config) = 0;
+  virtual HandleDestructor getFreeHostMemoryHandleDestructor() = 0;
----------------
Not sure this is a good name, because it makes it sound like it's just calling malloc, when in fact it's also registering the memory.


================
Comment at: acxxel/acxxel.h:757
+
+template <typename DeviceSrcTy, typename DeviceDstTy>
+Status Stream::asyncCopyDToD(DeviceSrcTy &&DeviceSrc, DeviceDstTy &&DeviceDst) {
----------------
Nit, it seems kind of inconsistent that we declare these out-of-line but the sync versions inline.


================
Comment at: acxxel/acxxel.h:933
+
+  /// Converts a const object to a DeviceMemorySpan of const.
+  DeviceMemorySpan<const element_type> asSpan() const {
----------------
Nit, const is an adjective, so "of const elements" or something?


================
Comment at: acxxel/acxxel.h:963
+template <typename T>
+DeviceMemory<T> &DeviceMemory<T>::operator=(DeviceMemory &&) noexcept = default;
+
----------------
Remind me why we can't do these inside the class definition?


================
Comment at: acxxel/acxxel.h:1011
+      auto Destructor = ThePlatform->getDeviceMemorySpanHandleDestructor();
+      Destructor(const_cast<value_type *>(TheSpanHandle));
+    }
----------------
Hm, this is two indirect calls, one to get the function pointer, and the other to call it.

We could make this just one virtual call, but that would be inconsistent with how you do literally everything else here.  So I guess this is fine as-is.  If it becomes a performance issue, we can address it then.


================
Comment at: acxxel/acxxel.h:1029
+
+  void *baseHandle() const noexcept {
+    return static_cast<void *>(const_cast<value_type *>(TheHandle));
----------------
This needs noexcept?  That's...interesting.  Maybe deserves a comment.


================
Comment at: acxxel/acxxel.h:1035
+    if (!TheSpanHandle) {
+      TheSpanHandle = ThePlatform->getDeviceMemorySpanHandle(
+          TheHandle, TheSize * sizeof(element_type),
----------------
Oh, this is the conversion that lets us pass it to a <<<>>> launch?  But this pointer is not valid on the host?  (Never?  Or, it's dereferenceable if we are using unified memory?)

If so, or probably also if not, a comment would be helpful here.


================
Comment at: acxxel/acxxel.h:1045
+    if (!Valid) {
+      std::terminate();
+    }
----------------
Interesting -- I presume you're doing this because that's what std::span does?  Does std::span not have a way to slice that doesn't do a bounds check?

Anyway I think this is very likely fine for our use-case.


================
Comment at: acxxel/acxxel.h:1077
+
+  explicit DeviceMemorySpan(Platform *ThePlatform, pointer AHandle,
+                            index_type Size, index_type Offset)
----------------
Do we need "explicit" here?  It makes a subtle difference, but unless you're going for it, maybe we just leave it off?


================
Comment at: acxxel/acxxel.h:1094
+///
+/// This memory unpins/unregisters itself when it goes out of scope.
+template <typename ElementType> class AsyncHostMemory {
----------------
Probably worth saying, "scope, but does not free itself."


================
Comment at: acxxel/acxxel.h:1160
+        static_cast<ElementType *>(ThePointer.get())[I].~ElementType();
+      }
+    }
----------------
     T *Memory = new (MaybeMemory.getValue()) T[ElementCount];

Oh, hm.  Are you sure this is the right way to undo your allocation above?  Like, we don't have to somehow delete the placement new'ed array itself?  (I don't see a way to do it on cppreference.)


https://reviews.llvm.org/D25701





More information about the Parallel_libs-commits mailing list