[llvm-dev] instrumenting device code with gpucc

Tue Mar 15 10:09:07 PDT 2016

Including fatbin into host code should be done in frontend.

On Mon, Mar 14, 2016 at 12:13 AM, Yuanfeng Peng <
yuanfeng.jack.peng at gmail.com> wrote:

> Hey Jingyue,
>
> Thanks for being so responsive!  I finally figured out a way to resolve
> the issue: all I have to do is to use `-only-needed` when merging the
> device bitcodes with llvm-link.
>
> However, since we actually need to instrument the host code as well,  I
> encountered another issue when I tried to glue the instrumented host code
> and fatbin together.  When I only instrumented the device code, I used the
> following cmd to do so:
>
> "/mnt/wtf/tools/bin/clang-3.9" "-cc1" "-triple" "x86_64-unknown-linux-gnu"
> "-aux-triple" "nvptx64-nvidia-cuda" "-fcuda-target-overloads"
> "-fcuda-disable-target-call-checks" "-emit-obj" "-disable-free"
> "-main-file-name" "axpy.cu" "-mrelocation-model" "static"
> "-mthread-model" "posix" "-fmath-errno" "-masm-verbose"
> "-mconstructor-aliases" "-munwind-tables" "-fuse-init-array" "-target-cpu"
> "x86-64" "-momit-leaf-frame-pointer" "-dwarf-column-info"
> "-debugger-tuning=gdb" "-resource-dir"
> "/mnt/wtf/tools/bin/../lib/clang/3.9.0" "-I"
> "/usr/local/cuda-7.0/samples/common/inc" "-internal-isystem"
> "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8"
> "-internal-isystem"
> "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8"
> "-internal-isystem"
> "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8"
> "-internal-isystem"
> "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/backward"
> "-internal-isystem" "/usr/local/include" "-internal-isystem"
> "/mnt/wtf/tools/bin/../lib/clang/3.9.0/include" "-internal-externc-isystem"
> "/usr/include/x86_64-linux-gnu" "-internal-externc-isystem" "/include"
> "-internal-externc-isystem" "/usr/include" "-internal-isystem"
> "/usr/local/cuda/include" "-include" "__clang_cuda_runtime_wrapper.h" "-O3"
> "-fdeprecated-macro" "-fdebug-compilation-dir"
> "/mnt/wtf/workspace/cuda/gpu-race-detection" "-ferror-limit" "19"
> "-fmessage-length" "291" "-pthread" "-fobjc-runtime=gcc" "-fcxx-exceptions"
> "-fexceptions" "-fdiagnostics-show-option" "-vectorize-loops"
> "-vectorize-slp" "-o" "axpy-host.o" "-x" "cuda" "tests/axpy.cu"
> "-fcuda-include-gpubinary" "axpy-sm_30.fatbin"
>
> which, from my understanding, compiles the host code in tests/axpy.cu and
> link it with axpy-sm_30.fatbin.  However, now that I instrumented the IR of
> the host code (axpy.bc) and did `llc axpy.bc -o axpy.s`, which cmd should I
> use to link axpy.s with axpy-sm_30.fatbin?  I tried to use -cc1as, but the
> flag '-fcuda-include-gpubinary' was not recognized.
>
> Thanks!
>
> yuanfeng
>
> On Sat, Mar 12, 2016 at 12:05 AM, Jingyue Wu <jingyue at google.com> wrote:
>
>> I've no idea. Without instrumentation, nvvm_reflect_anchor doesn't appear
>> in the final PTX, right? If that's the case, some pass in llc must have
>> deleted the anchor and you should be able to figure out which one.
>>
>> On Fri, Mar 11, 2016 at 4:56 PM, Yuanfeng Peng <
>> yuanfeng.jack.peng at gmail.com> wrote:
>>
>>> Hey Jingyue,
>>>
>>> Though I tried `opt -nvvm-reflect` on both bc files, the nvvm reflect
>>> anchor didn't go away; ptxas is still complaining about the duplicate
>>> definition of of function '_ZL21__nvvm_reflect_anchorv' .  Did I misused
>>> the nvvm-reflect pass?
>>>
>>> Thanks!
>>> yuanfeng
>>>
>>> On Fri, Mar 11, 2016 at 10:10 AM, Jingyue Wu <jingyue at google.com> wrote:
>>>
>>>> According to the examples you sent, I believe the linking issue was
>>>> caused by nvvm reflection anchors. I haven't played with that, but I guess
>>>> running nvvm-reflect on an IR removes the nvvm reflect anchors. After that,
>>>> you can llvm-link the two bc/ll files.
>>>>
>>>> Another potential issue is that your cuda_hooks-sm_30.ll is
>>>> unoptimized. This could cause the instrumented code to run super slow.
>>>>
>>>> On Fri, Mar 11, 2016 at 9:40 AM, Yuanfeng Peng <
>>>> yuanfeng.jack.peng at gmail.com> wrote:
>>>>
>>>>> Hey Jingyue,
>>>>>
>>>>> Attached are the .ll files.  Thanks!
>>>>>
>>>>> yuanfeng
>>>>>
>>>>> On Fri, Mar 11, 2016 at 3:47 AM, Jingyue Wu <jingyue at google.com>
>>>>> wrote:
>>>>>
>>>>>> Looks like we are getting closer!
>>>>>>
>>>>>> On Thu, Mar 10, 2016 at 5:21 PM, Yuanfeng Peng <
>>>>>> yuanfeng.jack.peng at gmail.com> wrote:
>>>>>>
>>>>>>> Hi Jingyue,
>>>>>>>
>>>>>>> Thank you so much for the helpful response!  I didn't know that PTX
>>>>>>> assembly cannot be linked; that's likely the reason for my issue.
>>>>>>>
>>>>>>> So I did the following as you suggested(axpy-sm_30.bc is the
>>>>>>> instrumented bitcode, and cuda_hooks-sm_30.bc contains the hook functions):
>>>>>>>
>>>>>>> *llvm-link axpy-sm_30.bc cuda_hooks-sm_30.bc  -o inst_axpy-sm_30.bc*
>>>>>>>
>>>>>>> *llc inst_axpy-sm_30.bc -o axpy-sm_30.s*
>>>>>>>
>>>>>>> *"/usr/local/cuda/bin/ptxas" "-m64" "-O3" -c "--gpu-name" "sm_30"
>>>>>>> "--output-file" axpy-sm_30.o axpy-sm_30.s*
>>>>>>>
>>>>>>> However, I got the following error from ptxas:
>>>>>>>
>>>>>>> *ptxas axpy-sm_30.s, line 106; error   : Duplicate definition of
>>>>>>> function '_ZL21__nvvm_reflect_anchorv'*
>>>>>>>
>>>>>>> *ptxas axpy-sm_30.s, line 106; fatal   : Parsing error near '.2':
>>>>>>> syntax error*
>>>>>>>
>>>>>>> *ptxas fatal   : Ptx assembly aborted due to errors*
>>>>>>>
>>>>>>> Looks like some cuda function definitions are in both bitcode files
>>>>>>> which caused duplicate definition... what am I supposed to do to resolve
>>>>>>> this issue?
>>>>>>>
>>>>>> Can you attach axpy-sm_30.ll and cuda_hooks-sm_30.ll? The duplication
>>>>>> may be caused by how nvvm reflection works, but I'd like to see a concrete
>>>>>> example.
>>>>>>
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> yuanfeng
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160315/f2a0711d/attachment.html>