[PATCH] D35703: [GPGPU] Add support for NVIDIA libdevice

Mon Jul 24 14:49:23 PDT 2017

grosser marked an inline comment as done.
grosser added a comment.

Thank you for all the good reviews. I tried to address all of them.

Best,
Tobias

================
Comment at: lib/CodeGen/PPCGCodeGeneration.cpp:107

+static cl::opt<std::string> LibDevice(
+    "polly-acc-libdevice", cl::desc("Path to CUDA libdevice"), cl::Hidden,
----------------
singam-sanjay wrote:
> tra wrote:
> > This is something that is useful for all NVPTX users and should probably live there and it should not have any hardcoded path -- it's too easy to end up silently picking wrong library otherwise.
> > 
> > Hardcoded compute_20 is also problematic because it should depend on particular GPU arch we're compiling for.
> > 
> > Considering that LLVM has no idea about CUDA SDL location, this is sommething that should always be explicitly specified. Either base path + libdevice name derived from GPU arch, or complete path to specific libdevice variant (i.e. it's completely up to the user to provide correct libdevice).
> Would it be better to call this CUDALibDevice or CULibDevice  instead ? since this applies only to NVPTX
I use CUDALibDevice

================
Comment at: lib/CodeGen/PPCGCodeGeneration.cpp:107-111
+static cl::opt<std::string> LibDevice(
+    "polly-acc-libdevice", cl::desc("Path to CUDA libdevice"), cl::Hidden,
+    cl::init("/usr/local/cuda-7.5/nvvm/libdevice/libdevice.compute_20.10.bc"),
+    cl::ZeroOrMore, cl::cat(PollyCategory));
+
----------------
grosser wrote:
> singam-sanjay wrote:
> > tra wrote:
> > > This is something that is useful for all NVPTX users and should probably live there and it should not have any hardcoded path -- it's too easy to end up silently picking wrong library otherwise.
> > > 
> > > Hardcoded compute_20 is also problematic because it should depend on particular GPU arch we're compiling for.
> > > 
> > > Considering that LLVM has no idea about CUDA SDL location, this is sommething that should always be explicitly specified. Either base path + libdevice name derived from GPU arch, or complete path to specific libdevice variant (i.e. it's completely up to the user to provide correct libdevice).
> > Would it be better to call this CUDALibDevice or CULibDevice  instead ? since this applies only to NVPTX
> I use CUDALibDevice
@tra: Thank you for your comment. This is the very first commit to introduce this feature. We currently are in early beta tests. The library location is supposed to be provided by the user by setting the path with polly-acc-libdevice. I set the option to a very basic default. For now I expect the user to adjust this default. In the future we can add some generic infrastructure to LLVM to derive this path automatically. If a specific fixed path is too confusing I can also use an empty default and always prompt for a path.

================
Comment at: lib/CodeGen/PPCGCodeGeneration.cpp:109
+    "polly-acc-libdevice", cl::desc("Path to CUDA libdevice"), cl::Hidden,
+    cl::init("/usr/local/cuda-7.5/nvvm/libdevice/libdevice.compute_20.10.bc"),
+    cl::ZeroOrMore, cl::cat(PollyCategory));
----------------
singam-sanjay wrote:
> Consider changing this to "/usr/local/**cuda**/nvvm/libdevice/libdevice.compute_20_10.bc". That would work on most Linux platforms by default.
> 
> I'm not sure if PTX code for a compute capability  2 device would run on any newer device. Is it possible to initialize this after figuring out the CC of device 0 ?
> 
> Also, I heard that CUDA SDK 8 would be the last to support CC 2.x. CUDA 9 supports all CCs from 3.0.
Changed. using /usr/local/cuda is indeed a good idea.

I would like to start with the oldest library. We can later add support for different library versions. I don't think we can query device 0, as we might compile on different platform as where we run the final code. However, we can make this depend on polly-acc-cuda-version.

================
Comment at: lib/CodeGen/PPCGCodeGeneration.cpp:609
+  /// Link with the NVIDIA libdevice library (if needed and available).
+  void addLibDevice();
+
----------------
singam-sanjay wrote:
> Would addCULibDevice be a better name ?
Changed.

================
Comment at: lib/CodeGen/PPCGCodeGeneration.cpp:1314
+/// A list of functions that are available in NVIDIA's libdevice.
+std::vector<std::string> LibDeviceFunctions = {"exp", "expf", "expl", "cos",
+                                               "cosf"};
----------------
bollu wrote:
> bollu wrote:
> > Consider changing to `SmallSet`? That feels correct in terms of semantics (a `set` of functions.)
> could you also please add `sqrt` to the list? this is present in the `COSMO` kernel.
Done.!

================
Comment at: lib/CodeGen/PPCGCodeGeneration.cpp:1314
+/// A list of functions that are available in NVIDIA's libdevice.
+std::vector<std::string> LibDeviceFunctions = {"exp", "expf", "expl", "cos",
+                                               "cosf"};
----------------
grosser wrote:
> bollu wrote:
> > bollu wrote:
> > > Consider changing to `SmallSet`? That feels correct in terms of semantics (a `set` of functions.)
> > could you also please add `sqrt` to the list? this is present in the `COSMO` kernel.
> Done.!
I use a std::set. That should be good enough for now.

================
Comment at: lib/CodeGen/PPCGCodeGeneration.cpp:2131
+    }
+  }
+
----------------
bollu wrote:
> consider moving computing `RequiresLibDevice` to a pure function? 
Done.

================
Comment at: lib/CodeGen/PPCGCodeGeneration.cpp:2138
+
+    if (!LibDeviceModule) {
+      BuildSuccessful = false;
----------------
bollu wrote:
> I believe we can `assert` at this point, since this is not a "mis-compile" in the strictest sense of the word?. 
I use report_fatal_error as suggested by Michael.

================
Comment at: lib/CodeGen/PPCGCodeGeneration.cpp:2148
+    // Set an nvptx64 target triple to avoid linker warnings. The original
+    // trible of the libdevice files are nvptx-unknown-unknown.
+    LibDeviceModule->setTargetTriple(Triple::normalize("nvptx64-nvidia-cuda"));
----------------
bollu wrote:
> nit: `trible` -> `triple`.
Done!

================
Comment at: lib/CodeGen/PPCGCodeGeneration.cpp:3020
   /// a function that we support in a kernel, return true.
-  bool containsInvalidKernelFunctionInBllock(const BasicBlock *BB) {
+  bool containsInvalidKernelFunctionInBllock(const BasicBlock *BB,
+                                             bool AllowLibDevice) {
----------------
tra wrote:
> Bllock -> Block
Fixed in r308715.

================
Comment at: test/GPGPU/libdevice-functions-copied-into-kernel.ll:38-69
+define void @f(float* %A, float* %B, i32 %N) {
+entry:
+  br label %entry.split
+
+entry.split:                                      ; preds = %entry
+  %cmp1 = icmp sgt i32 %N, 0
+  br i1 %cmp1, label %for.body.lr.ph, label %for.end
----------------
tra wrote:
> Can the test be reduced to just expf() call?
Unlikely. This is a test for Polly-ACC, where we auto-offload to CUDA. For this we need at least some parallelism, which means some loop.

================
Comment at: test/GPGPU/libdevice-functions-copied-into-kernel_libdevice.bc:1-6
+definelayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
+target triple = "nvptx64-nvidia-cuda"
+
+define float @__nv_expf(float %a) #1 {
+  return %a
+}
----------------
tra wrote:
> This file should be under Inputs/ directory (see NVPTX tests for example) and have .ll extension.
Very good idea. I adopted it.

https://reviews.llvm.org/D35703