[cfe-dev] [llvm-dev] LLVM/CUDA generate LLVM IR

Thu Oct 13 14:21:20 PDT 2016

Thank you very much for the testcases -- I'll look into fixing the
assertion failure.

> I think --cuda-gpu-arch=sm_35 and --cuda-path=/usr/local/cuda/ should be included, as the resulting code might be optimized for that architecture.

You want --cuda-gpu-arch=sm_35, otherwise we'll default to sm_20.
Which doesn't make a huge difference beyond affecting which intrinsics
are available to you, but still.  You also want to pass sm_35 because
that will affect how we invoke ptxas -- passing sm_35 will cause us to
use ptxas to generate GPU code specifically for sm_35.  If you don't
pass this but then run on an sm_35 GPU, the GPU driver will have to
generate code at runtime, and this can be very slow.

--cuda-path is optional, only required if clang can't find the CUDA
installation, or if you want to specify a different one than what it
finds by default.  You can see which one it finds by invoking clang
-v.

On Thu, Oct 13, 2016 at 2:17 PM, Gurunath Kadam
<gurunath.kadam at gmail.com> wrote:
> Hi,
>
> Thank you Justin for your prompt reply. I was able to generate an LLVM IR.
>
> For the error reproduction purposes, I have listed below all the commands
> which worked and which did not work.
>
> Works (I have not yet checked if files generated by all of them are same or
> not):
>
>      clang++ -O3 -emit-llvm -c axpy.cu -o axpy.bc --cuda-gpu-arch=sm_35
> --cuda-path=/usr/local/cuda/ --cuda-device-only
>
>      clang++ -O3 -emit-llvm -c axpy.cu -o axpy.bc --cuda-device-only
>
> Does not work:
>
>       clang++ -O3 -emit-llvm -c axpy.cu --cuda-gpu-arch=sm_35 -o axpy.bc
>
> I think --cuda-gpu-arch=sm_35 and --cuda-path=/usr/local/cuda/ should be
> included, as the resulting code might be optimized for that architecture. I
> might be wrong though.
>
> Thank you again.
>
> -Guru
>
> On Thu, Oct 13, 2016 at 4:38 PM, Justin Lebar <jlebar at google.com> wrote:
>>
>> If you add -### to your original command, you'll see that for CUDA
>> compilations, we invoke clang -cc1 twice: Once for the host, and once
>> for the device.  We can't emit llvm or asm for both host and device at
>> once, so you need to tell clang which one you want.
>>
>> The flag to do this is --cuda-device-only (or --cuda-host-only).
>>
>> Alternatively, you could compile with -save-temps to get everything.
>>
>> Feel free to send me a patch adding this information to
>> http://llvm.org/docs/CompileCudaWithLLVM.html so that we can help
>> others avoid this hiccup.  The document lives in
>> llvm/docs/CompileCudaWithLLVM.rst.
>>
>> > I tried adding -S -emit-llvm and changed the output file name, but I
>> > keep getting following error:
>>
>> That is a bug -- we should give you a meaningful error.  It looks like
>> this bug was probably introduced by the generic offloading driver
>> changes.
>>
>> I am having difficulty reproducing the assertion failure, however.
>> Can you please provide a concrete steps to reproduce?
>>
>> Regards,
>> -Justin
>>
>> On Thu, Oct 13, 2016 at 1:28 PM, Reid Kleckner <rnk at google.com> wrote:
>> > Moving to cfe-dev
>> >
>> > +Art and Justin
>> >
>> > On Thu, Oct 13, 2016 at 1:13 PM, Gurunath Kadam via llvm-dev
>> > <llvm-dev at lists.llvm.org> wrote:
>> >>
>> >> So for a c program we do:
>> >>
>> >>         clang -O3 -emit-llvm hello.c -c -o hello.bc
>> >>
>> >> But how to generate an LLVM IR when working with CUDA.
>> >>
>> >> for normal compilation:
>> >>          clang++ axpy.cu -o axpy --cuda-gpu-arch=<GPU arch> -L<CUDA
>> >> install path>/<lib64 or lib> -lcudart_static -ldl -lrt -pthread
>> >>
>> >> I tried adding -S -emit-llvm and changed the output file name, but I
>> >> keep
>> >> getting following error:
>> >>
>> >> clang++:
>> >>
>> >> /stor/gakadam/llvm_projects/llvm/tools/clang/lib/Driver/Driver.cpp:1618:
>> >> virtual
>> >>
>> >> {anonymous}::OffloadingActionBuilder::DeviceActionBuilder::ActionBuilderReturnCode
>> >>
>> >> {anonymous}::OffloadingActionBuilder::CudaActionBuilder::getDeviceDepences(clang::driver::OffloadAction::DeviceDependences&,
>> >> clang::driver::phases::ID, clang::driver::phases::ID,
>> >> {anonymous}::OffloadingActionBuilder::DeviceActionBuilder::PhasesTy&):
>> >> Assertion `CurPhase < phases::Backend && "Generating single CUDA "
>> >> "instructions should only occur " "before the backend phase!"' failed.
>> >>
>> >> I tried several combinations but no avail!
>> >>
>> >> Any suggestions?
>> >>
>> >> Thank you.
>> >>
>> >> Sincerely,
>> >> Guru
>> >>
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> llvm-dev at lists.llvm.org
>> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >>
>> >
>
>