[cfe-dev] [llvm-dev] LLVM/CUDA generate LLVM IR

Wed Oct 26 14:05:10 PDT 2016

To close the loop, I found the change that introduced this crash and
pinged the author of the change.  Hopefully we can get this fixed
soon.

https://reviews.llvm.org/D18172#580276

On Thu, Oct 13, 2016 at 2:21 PM, Justin Lebar <jlebar at google.com> wrote:
> Thank you very much for the testcases -- I'll look into fixing the
> assertion failure.
>
>> I think --cuda-gpu-arch=sm_35 and --cuda-path=/usr/local/cuda/ should be included, as the resulting code might be optimized for that architecture.
>
> You want --cuda-gpu-arch=sm_35, otherwise we'll default to sm_20.
> Which doesn't make a huge difference beyond affecting which intrinsics
> are available to you, but still.  You also want to pass sm_35 because
> that will affect how we invoke ptxas -- passing sm_35 will cause us to
> use ptxas to generate GPU code specifically for sm_35.  If you don't
> pass this but then run on an sm_35 GPU, the GPU driver will have to
> generate code at runtime, and this can be very slow.
>
> --cuda-path is optional, only required if clang can't find the CUDA
> installation, or if you want to specify a different one than what it
> finds by default.  You can see which one it finds by invoking clang
> -v.
>
> On Thu, Oct 13, 2016 at 2:17 PM, Gurunath Kadam
> <gurunath.kadam at gmail.com> wrote:
>> Hi,
>>
>> Thank you Justin for your prompt reply. I was able to generate an LLVM IR.
>>
>> For the error reproduction purposes, I have listed below all the commands
>> which worked and which did not work.
>>
>> Works (I have not yet checked if files generated by all of them are same or
>> not):
>>
>>      clang++ -O3 -emit-llvm -c axpy.cu -o axpy.bc --cuda-gpu-arch=sm_35
>> --cuda-path=/usr/local/cuda/ --cuda-device-only
>>
>>      clang++ -O3 -emit-llvm -c axpy.cu -o axpy.bc --cuda-device-only
>>
>> Does not work:
>>
>>       clang++ -O3 -emit-llvm -c axpy.cu --cuda-gpu-arch=sm_35 -o axpy.bc
>>
>> I think --cuda-gpu-arch=sm_35 and --cuda-path=/usr/local/cuda/ should be
>> included, as the resulting code might be optimized for that architecture. I
>> might be wrong though.
>>
>> Thank you again.
>>
>> -Guru
>>
>> On Thu, Oct 13, 2016 at 4:38 PM, Justin Lebar <jlebar at google.com> wrote:
>>>
>>> If you add -### to your original command, you'll see that for CUDA
>>> compilations, we invoke clang -cc1 twice: Once for the host, and once
>>> for the device.  We can't emit llvm or asm for both host and device at
>>> once, so you need to tell clang which one you want.
>>>
>>> The flag to do this is --cuda-device-only (or --cuda-host-only).
>>>
>>> Alternatively, you could compile with -save-temps to get everything.
>>>
>>> Feel free to send me a patch adding this information to
>>> http://llvm.org/docs/CompileCudaWithLLVM.html so that we can help
>>> others avoid this hiccup.  The document lives in
>>> llvm/docs/CompileCudaWithLLVM.rst.
>>>
>>> > I tried adding -S -emit-llvm and changed the output file name, but I
>>> > keep getting following error:
>>>
>>> That is a bug -- we should give you a meaningful error.  It looks like
>>> this bug was probably introduced by the generic offloading driver
>>> changes.
>>>
>>> I am having difficulty reproducing the assertion failure, however.
>>> Can you please provide a concrete steps to reproduce?
>>>
>>> Regards,
>>> -Justin
>>>
>>> On Thu, Oct 13, 2016 at 1:28 PM, Reid Kleckner <rnk at google.com> wrote:
>>> > Moving to cfe-dev
>>> >
>>> > +Art and Justin
>>> >
>>> > On Thu, Oct 13, 2016 at 1:13 PM, Gurunath Kadam via llvm-dev
>>> > <llvm-dev at lists.llvm.org> wrote:
>>> >>
>>> >> So for a c program we do:
>>> >>
>>> >>         clang -O3 -emit-llvm hello.c -c -o hello.bc
>>> >>
>>> >> But how to generate an LLVM IR when working with CUDA.
>>> >>
>>> >> for normal compilation:
>>> >>          clang++ axpy.cu -o axpy --cuda-gpu-arch=<GPU arch> -L<CUDA
>>> >> install path>/<lib64 or lib> -lcudart_static -ldl -lrt -pthread
>>> >>
>>> >> I tried adding -S -emit-llvm and changed the output file name, but I
>>> >> keep
>>> >> getting following error:
>>> >>
>>> >> clang++:
>>> >>
>>> >> /stor/gakadam/llvm_projects/llvm/tools/clang/lib/Driver/Driver.cpp:1618:
>>> >> virtual
>>> >>
>>> >> {anonymous}::OffloadingActionBuilder::DeviceActionBuilder::ActionBuilderReturnCode
>>> >>
>>> >> {anonymous}::OffloadingActionBuilder::CudaActionBuilder::getDeviceDepences(clang::driver::OffloadAction::DeviceDependences&,
>>> >> clang::driver::phases::ID, clang::driver::phases::ID,
>>> >> {anonymous}::OffloadingActionBuilder::DeviceActionBuilder::PhasesTy&):
>>> >> Assertion `CurPhase < phases::Backend && "Generating single CUDA "
>>> >> "instructions should only occur " "before the backend phase!"' failed.
>>> >>
>>> >> I tried several combinations but no avail!
>>> >>
>>> >> Any suggestions?
>>> >>
>>> >> Thank you.
>>> >>
>>> >> Sincerely,
>>> >> Guru
>>> >>
>>> >> _______________________________________________
>>> >> LLVM Developers mailing list
>>> >> llvm-dev at lists.llvm.org
>>> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> >>
>>> >
>>
>>