[PATCH] D35703: [GPGPU] Add support for NVIDIA libdevice

Artem Belevich via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jul 20 16:01:49 PDT 2017


tra added inline comments.


================
Comment at: lib/CodeGen/PPCGCodeGeneration.cpp:107-111
+static cl::opt<std::string> LibDevice(
+    "polly-acc-libdevice", cl::desc("Path to CUDA libdevice"), cl::Hidden,
+    cl::init("/usr/local/cuda-7.5/nvvm/libdevice/libdevice.compute_20.10.bc"),
+    cl::ZeroOrMore, cl::cat(PollyCategory));
+
----------------
This is something that is useful for all NVPTX users and should probably live there and it should not have any hardcoded path -- it's too easy to end up silently picking wrong library otherwise.

Hardcoded compute_20 is also problematic because it should depend on particular GPU arch we're compiling for.

Considering that LLVM has no idea about CUDA SDL location, this is sommething that should always be explicitly specified. Either base path + libdevice name derived from GPU arch, or complete path to specific libdevice variant (i.e. it's completely up to the user to provide correct libdevice).


================
Comment at: lib/CodeGen/PPCGCodeGeneration.cpp:3020
   /// a function that we support in a kernel, return true.
-  bool containsInvalidKernelFunctionInBllock(const BasicBlock *BB) {
+  bool containsInvalidKernelFunctionInBllock(const BasicBlock *BB,
+                                             bool AllowLibDevice) {
----------------
Bllock -> Block


================
Comment at: test/GPGPU/libdevice-functions-copied-into-kernel.ll:38-69
+define void @f(float* %A, float* %B, i32 %N) {
+entry:
+  br label %entry.split
+
+entry.split:                                      ; preds = %entry
+  %cmp1 = icmp sgt i32 %N, 0
+  br i1 %cmp1, label %for.body.lr.ph, label %for.end
----------------
Can the test be reduced to just expf() call?


================
Comment at: test/GPGPU/libdevice-functions-copied-into-kernel_libdevice.bc:1-6
+definelayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
+target triple = "nvptx64-nvidia-cuda"
+
+define float @__nv_expf(float %a) #1 {
+  return %a
+}
----------------
This file should be under Inputs/ directory (see NVPTX tests for example) and have .ll extension.


https://reviews.llvm.org/D35703





More information about the llvm-commits mailing list