[Parallel_libs-commits] [PATCH] D25701: Initial check-in of Acxxel (StreamExecutor renamed)

Mon Oct 24 16:47:18 PDT 2016

jhen added a comment.

In addition to responding to jlebar's posted comments, I also removed the `acxxel::getPlatform` function and replaced it with the two functions `acxxel::getCUDAPlatform` and `acxxel::getOpenCLPlatform`. I also added a comment to explain that the CUDA and OpenCL platforms are available out of the box with Acxxel, but that other platforms can be created as well. The old `acxxel::getPlatform` function made it confusing to think about how to add a new platform because it seemed like the new platform should also be registered somehow to put in on equal footing with CUDA and OpenCL. I hope this new design will be clearer in this aspect.

================
Comment at: acxxel/cuda_acxxel.cpp:29
+/// Index of active device for this thread.
+thread_local int ActiveDeviceIndex;
+
----------------
jlebar wrote:
> jhen wrote:
> > jlebar wrote:
> > > Does this have to be per-thread rather than per stream?  Kind of a bummer that one thread can't operate on two streams for different devices (among other things).
> > > 
> > > At least maybe setActiveDevice should be called setActiveDeviceForThread.
> > TL;DR the CUDA driver library forces us to do it per thread or to suffer a performance penalty.
> > 
> > I chose it to be per thread for efficiency reasons. The old StreamExecutor tried to do it per stream, and this led to extra library calls for every operation to check that the current device was equal to the one expected by the Stream. Ideally we would just store a "context" in the stream, the context would know which device it belonged to, and the context would be used to launch all work, but unfortunately that is not how the CUDA driver library works. Instead, the CUDA driver library works a lot like this where it allows you to set the active context, but then you don't pass the context to each method call, instead, it just uses the active context.
> > 
> > It actually doesn't prevent a thread from using streams from different devices. It just forces that thread to do the "active device" bookkeeping on its own. Want to use stream0? First set the active device to device0. Want to go back to stream1? Remember to set the active device back to device1.
> > 
> > I did change the name as you suggested, though.
> > I chose it to be per thread for efficiency reasons. 
> 
> OK, I am totally onboard with that -- zero-cost abstractions.
> 
> But maybe we can have assert()s that ensure that users are always switching to a stream's device before touching that stream?  If we have to have an `ifndef NDEBUG` variable or two inside the stream class to support this, that's fine by me.
> 
> We also need to have comments explaining this weird requirement.
I don't think that `assert()` calls are useful to check this because the platform libraries themselves (CUDA, OpenCL, etc.) will emit errors for these cases.

I added comments to `Stream` to explain its associated device index and caution about using it when the active device index doesn't match, and I added a function to query the device index for a Stream.

================
Comment at: acxxel/examples/opencl_example.cpp:56
+  std::array<float, 3> Y = {{3.f, 4.f, 5.f}};
+  std::array<float, 3> Expected = {{3.f, 6.f, 9.f}};
+  saxpy(A, X, Y);
----------------
jlebar wrote:
> Nit, we can get away with single braces here, which I think is more natural.
I blame `-Wall -Wextra` for this. I changed it to `-Wall -Wextra -Wno-missing-braces` to get rid of the warning for this case.

================
Comment at: acxxel/examples/opencl_example.cpp:59
+  for (int I = 0; I < 3; ++I)
+    assert(X[I] == Expected[I]);
+}
----------------
jlebar wrote:
> If we're using assert(), we should #error if we're built with NDEBUG, or alternatively #undef NDEBUG as the first thing in this file (before any #includes).
I changed assert to "check and exit" to prevent having to deal with this.

================
Comment at: acxxel/opencl_acxxel.cpp:24
+namespace acxxel {
+namespace opencl {
+
----------------
jlebar wrote:
> I wonder if we need this extra namespace (same for cuda).  We don't actually expose anything other than the make-platform function in it.  Do we plan on exposing anything else?
I mostly got rid of these namespaces, but, as part of another design change, I moved the platform creation (now getter) functions into them.

================
Comment at: acxxel/opencl_acxxel.cpp:39
+struct OpenCLEvent {
+  OpenCLEvent(cl_event Event) : Event(Event) {}
+
----------------
jlebar wrote:
> Do you want this to be explicit?
Yes, I just forgot to mark it. Thanks for catching that.

================
Comment at: acxxel/opencl_acxxel.cpp:535
+  OpenCLEvent *CLEvent = static_cast<OpenCLEvent *>(Event);
+  std::unique_lock<std::mutex> Lock(CLEvent->Mutex);
+  cl_event OldEvent = CLEvent->Event;
----------------
jlebar wrote:
> Did we decide to make streams single-threaded to avoid this overhead?
> 
> If so, in addition to getting rid of the mutex, we should get rid of the comment about concurrently enqueueing an event, since it's not valid to touch the stream concurrently at all.
Thanks for catching this. I got rid of the mutex and got rid of the whole `CLEvent` struct, since it is now just a `cl_event*`.

I also got rid of some comments for `addCallback` that talked about accessing the same stream from multiple threads, but I didn't see a comment that talked about `Event` objects being enqueued on streams concurrently. Please let me know if I didn't get rid of a comment I should have.

================
Comment at: acxxel/span.h:10
+///
+/// This file defines the span utility class.
+///
----------------
jlebar wrote:
> I really don't like these top-level comments that say nothing you couldn't have guessed from the filename, even though we have them all over LLVM.
> 
> Does having this really help doxygen?  If so, would it make doxygen look awful if we moved the comment currently on the span class into this block, so that it actually provided some value?
I don't like them either. They don't help with Doxygen as far as I know. I just put them in because I thought they were required for LLVM style. I've removed most of them now, but I did leave a few that I thought were actually serving a purpose.

================
Comment at: acxxel/span.h:59
+    if (Count < 0 || (!Ptr && Count))
+      std::terminate();
+  }
----------------
jlebar wrote:
> I am really surprised that this is what they suggest in the proposal.  It's fine for now, if it becomes a problem we can turn it into an assert().
Sounds good. Like you say, it should be easy to change.

================
Comment at: acxxel/span.h:94
+           !std::is_base_of<SpanBase, Container>::value &&
+           std::is_convertible<decltype(&Cont[0]), pointer>::value>::type * =
+           nullptr)
----------------
jlebar wrote:
> Nit, could we just use `data()` in the decltype?  We're going to use it in the constructor below anyway.  Or is this as specified and the spec is just weird?
I did this to try to make sure that the container supported `operator[]` (the proposed spec suggested I should check for this). Do you think there is a cleaner way to do that?

================
Comment at: acxxel/span.h:195
+
+  pointer data() const noexcept { return Data; }
+
----------------
jlebar wrote:
> I cannot figure out the reason that some of these functions are noexcept but not others.
I don't understand it either. In the class comment I added a link the spec document I used. That's where I got all these signatures.

================
Comment at: acxxel/status.h:22
+#ifdef __clang__
+#define ACXXEL_WARN_UNUSED_RESULT __attribute__((warn_unused_result))
+#else
----------------
jlebar wrote:
> gcc seems to support this, at least at head: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes
Unfortunately gcc only seems to support this as a function attribute (and that seems consistent with the documentation you cited). When I tried to use it as a class attribute, I got a compilation error.

================
Comment at: acxxel/status.h:34
+public:
+  /// Creates an Status representing success.
+  Status() : HasMessage(false) {}
----------------
jlebar wrote:
> s/an/a/, and below.
Oops. `Status` used to be called `Error`. 

================
Comment at: acxxel/status.h:57
+    return *this;
+  }
+
----------------
jlebar wrote:
> Can we not `=default` these four?
I learned something sad while trying to do this. `Status` has a `std::string` field, and it seems that the move assignment operator for `std::string` is not marked `noexcept` in my standard library. This means that the default move assignment operator for `Status` is not `noexcept` and I get a compile error if I try to declare it `noexcept` and `default`.

To deal with this, I chose to retain the `noexcept` and to re-implement the default move assignment operator. I was able to default the other 3 with my standard library, so I did that.

================
Comment at: acxxel/status.h:94
+  // Intentionally implicit.
+  Expected(T &&Value) : TheState(State::SUCCESS), TheValue(std::move(Value)) {}
+
----------------
jlebar wrote:
> We can do the same thing with Value -- just take it by value and std::move it.
I think I understood you correctly here. I collapsed the `const T &Value` and `T &&Value` constructors into one constructor taking the argument by value and moving it into `TheValue`. Let me know if I was only supposed to replace one of these constructors that way.

================
Comment at: acxxel/tests/acxxel_test.cpp:362
+  // Event0 can only occur after GoFlag and MarkerFlag are set.
+  EXPECT_FALSE(Stream0.enqueueEvent(Event0).takeStatus().isError());
+
----------------
jlebar wrote:
> Wait...we have EXPECT*?  Can we use this in the tests where we're currently using `assert()`s?
I couldn't find the places I am using `assert()` in tests. I think you might be referring to the stuff in the `examples` directory. That stuff doesn't depend on gtest, so it uses `assert()` instead.

================
Comment at: acxxel/tests/span_test.cpp:37
+TEST(Span, PtrSizeConstruction) {
+  int ZeroSize = 0;
+  acxxel::Span<int> Span0(nullptr, ZeroSize);
----------------
jlebar wrote:
> Nit, not sure why we have a variable for this.  If you just want to make it clear what the second arg to the Span constructor means, you could write
> 
>   Span(nullptr, /* size = */ 0);
> 
> I believe we even have a clang-tidy check to check comments in this style.
This is a fun one. If I pass in a literal `0` it tells me the call is ambiguous because `Span<int>` also has a constructor with the signature `Span<int>(int *FirstElem, int *LastElem)`. It seems that C++ is helpfully willing to convert a literal `0` to an `int *` (I'm sure it has to do with NULL and C compatibility).

I still thought it was a useful test, though, because the size argument could be stored in a variable as I have done in my test.

================
Comment at: acxxel/tests/span_test.cpp:79
+TEST(Span, ArrayConstruction) {
+  int Array[3] = {0, 1, 2};
+  acxxel::Span<int> Span(Array);
----------------
jlebar wrote:
> Nit, should be consistent wrt we have an explicit length in the array or not.  I kind of prefer not.  (This appears in other files as well.)
I think I got all the places where I was doing this.

https://reviews.llvm.org/D25701