[PATCH] D35703: [GPGPU] Add support for NVIDIA libdevice

Mon Jul 24 15:51:54 PDT 2017

tra added inline comments.

================
Comment at: lib/CodeGen/PPCGCodeGeneration.cpp:113
 static cl::opt<std::string>
     CudaVersion("polly-acc-cuda-version",
                 cl::desc("The CUDA version to compile for"), cl::Hidden,
----------------
The variable appears to be unused.

On a side note, please consider that there's already a way to specify GPU variant for NVPTX back-end. This option is either going to be redundant or you'll need a good explanation for what's supposed to happen when its value conflicts with whatever GPU variant NVPTX back-end thinks it's supposed to generate the code for.

================
Comment at: lib/CodeGen/PPCGCodeGeneration.cpp:118
 static cl::opt<int>
     MinCompute("polly-acc-mincompute",
                cl::desc("Minimal number of compute statements to run on GPU."),
----------------
This also appears to be unused. 
Please remove them from this patch and re-introduce them in subsequent patches that really need them.

================
Comment at: lib/CodeGen/PPCGCodeGeneration.cpp:2277
+    std::string CUDALibDeviceFunc = getCUDALibDeviceFuntion(&F);
+    if (CUDALibDeviceFunc.length() != 0) {
+      F.setName(CUDALibDeviceFunc);
----------------
!empty()

================
Comment at: lib/CodeGen/PPCGCodeGeneration.cpp:2288
+  if (Arch != GPUArch::NVPTX64)
+    return;
+
----------------
What's supposed to happen in this case? Consider adding a diagnostic message.

================
Comment at: lib/CodeGen/PPCGCodeGeneration.cpp:2293
+
+    errs() << CUDALibDevice << "\n";
+    auto LibDeviceModule =
----------------
But why are you still printing the libdevice name on stderr?

================
Comment at: lib/CodeGen/PPCGCodeGeneration.cpp:2323
     if (FailOnVerifyModuleFailure)
       llvm_unreachable("VerifyModule failed.");

----------------
I'm curious -- when would it be OK to proceed if verifier has failed? 

Should  this option be other way around -- and make LLVM fail by default on verifier failure? That would be my expectation of normal behavior. One could conceivably use this option for debugging purposes to force LLVM to proceed even when verifier has failed, but in general I believe that by default the errors should be reported as early as possible.

================
Comment at: lib/CodeGen/PPCGCodeGeneration.cpp:107-111
+static cl::opt<std::string> LibDevice(
+    "polly-acc-libdevice", cl::desc("Path to CUDA libdevice"), cl::Hidden,
+    cl::init("/usr/local/cuda-7.5/nvvm/libdevice/libdevice.compute_20.10.bc"),
+    cl::ZeroOrMore, cl::cat(PollyCategory));
+
----------------
grosser wrote:
> grosser wrote:
> > singam-sanjay wrote:
> > > tra wrote:
> > > > This is something that is useful for all NVPTX users and should probably live there and it should not have any hardcoded path -- it's too easy to end up silently picking wrong library otherwise.
> > > > 
> > > > Hardcoded compute_20 is also problematic because it should depend on particular GPU arch we're compiling for.
> > > > 
> > > > Considering that LLVM has no idea about CUDA SDL location, this is sommething that should always be explicitly specified. Either base path + libdevice name derived from GPU arch, or complete path to specific libdevice variant (i.e. it's completely up to the user to provide correct libdevice).
> > > Would it be better to call this CUDALibDevice or CULibDevice  instead ? since this applies only to NVPTX
> > I use CUDALibDevice
> @tra: Thank you for your comment. This is the very first commit to introduce this feature. We currently are in early beta tests. The library location is supposed to be provided by the user by setting the path with polly-acc-libdevice. I set the option to a very basic default. For now I expect the user to adjust this default. In the future we can add some generic infrastructure to LLVM to derive this path automatically. If a specific fixed path is too confusing I can also use an empty default and always prompt for a path.
I believe no default would be a better option in this case as it minimizes possibility for things to go wrong silently.

================
Comment at: test/GPGPU/libdevice-functions-copied-into-kernel.ll:38-69
+define void @f(float* %A, float* %B, i32 %N) {
+entry:
+  br label %entry.split
+
+entry.split:                                      ; preds = %entry
+  %cmp1 = icmp sgt i32 %N, 0
+  br i1 %cmp1, label %for.body.lr.ph, label %for.end
----------------
grosser wrote:
> tra wrote:
> > Can the test be reduced to just expf() call?
> Unlikely. This is a test for Polly-ACC, where we auto-offload to CUDA. For this we need at least some parallelism, which means some loop.
It may be worth reconsidering your approach and making this functionality generic to NVPTX so it can benefit all users of NVPTX back-end. The functionality is generic enough to benefit CUDA (and possibly OpenCL) which currently fail miserably if any of standard library functions sneak into IR. 

https://reviews.llvm.org/D35703