[PATCH] D146389: [clang-repl][CUDA] Initial interactive CUDA support for clang-repl

Wed Apr 5 01:41:23 PDT 2023

argentite added a comment.

In D146389#4243900 <https://reviews.llvm.org/D146389#4243900>, @tra wrote:

>> Initial interactive CUDA support for clang-repl
>
> What should a user expect to be supported, functionality wise? I assume it should cover parsing and compilation. I'm not so sure about the execution. Should it be expected to actually launch kernels, or will that come in a future patch?

With this patch alone, we can launch kernels with the usual syntax. The `__device__` functions need to be `inline` for now. We plan to automate that in the future.

================
Comment at: clang/include/clang/Interpreter/Interpreter.h:43-44
 public:
   static llvm::Expected<std::unique_ptr<CompilerInstance>>
   create(std::vector<const char *> &ClangArgv);
+  static llvm::Expected<std::unique_ptr<CompilerInstance>>
----------------
Hahnfeld wrote:
> If I understand the change correctly, the "old" `create` function on its own is not sufficient anymore to create a fully working `CompilerInstance` - should we make it `private`? (and unify the two `Builder` classes into one?)
> 
> Alternatively, we could keep the current function working and move the bulk of the work to an `createInternal` function. What do you think?
Yes the old `create` should probably be private. I was also thinking we could merge `IncrementalCudaCompilerBuilder` with `IncrementalCompilerBuilder` and make it stateful with CUDA SDK path for example. Then we could do something like:
```
IncrementalCompilerBuilder Builder;
Builder.setCudaPath(...);
auto DeviceCI = Builder.createCudaDevice();
auto HostCI = Builder.createCudaHost();
```

================
Comment at: clang/lib/CodeGen/CodeGenAction.cpp:296
       }
+      LinkModules.clear();
       return false; // success
----------------
Hahnfeld wrote:
> This looks like a change that has implications beyond support for CUDA. Would it make sense to break this out into a separate review, ie does this change something in the current setup?
It actually was a separate patch: D146388 Should I submit that for review?
It seems to be required because we call `LinkInModules()` once for every interpreter iteration.

================
Comment at: clang/lib/Interpreter/IncrementalParser.cpp:125

+static CodeGenerator *getCodeGen(FrontendAction *Act);
+
----------------
Hahnfeld wrote:
> Is it possible to move the `static` function here instead of forward declaring? Or does it depend on other functions in this file?
We can probably move this. I just wanted to preserve the history,

================
Comment at: clang/lib/Interpreter/IncrementalParser.cpp:143-156
+  // An initial PTU is needed as CUDA includes some headers automatically
+  auto PTU = ParseOrWrapTopLevelDecl();
+  if (auto E = PTU.takeError()) {
+    consumeError(std::move(E)); // FIXME
+    return;                     // PTU.takeError();
+  }
+
----------------
Hahnfeld wrote:
> Should this be done as part of the "generic" `IncrementalParser` constructor, or only in the CUDA case by something else? Maybe `Interpreter::createWithCUDA`?
That seems safer. Although I did not notice any side effects.

================
Comment at: clang/lib/Interpreter/Interpreter.cpp:270-274
+  if (DeviceParser) {
+    auto DevicePTU = DeviceParser->Parse(Code);
+    if (auto E = DevicePTU.takeError())
+      return E;
+  }
----------------
Hahnfeld wrote:
> Just wondering about the order here: Do we have to parse the device side first? Does it make a difference for diagnostics? Maybe you can add a comment about the choice here...
The fatbinary from the device side is used in the host pipeline.

================
Comment at: clang/lib/Interpreter/Offload.cpp:1
+//===-------------- Offload.cpp - CUDA Offloading ---------------*- C++ -*-===//
+//
----------------
Hahnfeld wrote:
> v.g.vassilev wrote:
> > How about `DeviceOffload.cpp`?
> Or `IncrementalCUDADeviceParser.cpp` for the moment - not sure what other classes will be added in the future, and if they should be in the same TU.
I wanted to avoid "CUDA" in case we use it later for HIP.

================
Comment at: clang/tools/clang-repl/ClangRepl.cpp:137
+
+    ExitOnErr(Interp->LoadDynamicLibrary("libcudart.so"));
+  } else
----------------
v.g.vassilev wrote:
> tra wrote:
> > v.g.vassilev wrote:
> > > tra wrote:
> > > > Is there any doc describing the big picture approach to CUDA REPL implementation and how all the pieces tie together?
> > > > 
> > > > From the patch I see that we will compile GPU side of the code to PTX, pack it into fatbinary, but it's not clear now do we get from there to actually launching the kernels. Loading libcudart.so here also does not appear to be tied to anything else. I do not see any direct API calls, and the host-side compilation appears to be done w.o passing the GPU binary to it, which would normally trigger generation of the glue code to register the kernels with CUDA runtime. I may be missing something, too.
> > > > 
> > > > I assume the gaps will be filled in in future patches, but I'm still curious about the overall plan.
> > > > 
> > > > 
> > > Hi @tra, thanks for asking. Our reference implementation was done in Cling a while ago by @SimeonEhrig. One of his talks which I think describes well the big picture could be found here: https://compiler-research.org/meetings/#caas_04Mar2021
> > Cling does ring the bell. The slides from the link above do look OK.
> > 
> There is also a video.
> I do not see any direct API calls, and the host-side compilation appears to be done w.o passing the GPU binary to it, which would normally trigger generation of the glue code to register the kernels with CUDA runtime.

We do pass the generated fatbinary to the host side. The device code compilation happens before host side.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146389/new/

https://reviews.llvm.org/D146389