[cfe-dev] compiling CUDA w/ -fdebug-default-version=5 generates invalid PTX

David Blaikie via cfe-dev cfe-dev at lists.llvm.org
Thu Dec 3 12:39:12 PST 2020


On Thu, Dec 3, 2020 at 12:29 PM Artem Belevich via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

>
>
> On Thu, Dec 3, 2020 at 11:57 AM Alexey.Bataev <a.bataev at outlook.com>
> wrote:
>
>> Hi Artem, here is what I found about this.
>>
>> These labels are emitted only if DWARF 4 or 5 is used. They are required
>> for emission of the DW_AT_call_site attribute. The info about callsites
>> also emitted for lineinfo emission with debug info for profiling like in
>> your example (-O1 -gmlt).
>>
> Thank you for looking into this.
>
>
>> Call/CallUni instruction is treated as a separate instruction by the
>> debug info generator and it treats it as a separate call and emits labels
>> for it.
>>
> Would it help if we were to fold the whole call.uni into a single line? It
> appears that swarf injects the label after the call instruction line gets
> printed.
> What is the label supposed to point at? At the call instruction itself? Or
> at the return point?
>

I believe it's immediately after the call.

Testing an x86 example:

$ cat test.c

void f1();

void f2();

void f3() {

  f1();

  f2();

}

$ clang-tot -gdwarf-5 test.c -c -O3 && llvm-dwarfdump-tot test.o | grep
"DW_TAG\|_call_" && llvm-objdump -d test.o

0x0000000c: *DW_TAG*_compile_unit

0x00000023:   *DW_TAG*_subprogram

                DW_AT*_call_*all_calls    (true)

0x0000002e:     *DW_TAG_call_*site

                  DW_AT*_call_*origin     (0x0000003b)

                  DW_AT*_call_*return_pc  (0x0000000000000008)

0x00000034:     *DW_TAG_call_*site

                  DW_AT*_call_*origin     (0x00000041)

                  DW_AT*_call_*tail_call  (true)

                  DW_AT*_call_*pc (0x000000000000000b)

0x0000003b:   *DW_TAG*_subprogram

0x0000003f:     *DW_TAG*_unspecified_parameters

0x00000041:   *DW_TAG*_subprogram

0x00000045:     *DW_TAG*_unspecified_parameters


test.o: file format elf64-x86-64



Disassembly of section .text:


0000000000000000 <f3>:

       0: 50                            pushq   %rax

       1: 31 c0                         xorl    %eax, %eax

       3: e8 00 00 00 00                callq   0x8 <f3+0x8>

       8: 31 c0                         xorl    %eax, %eax

       a: 59                            popq    %rcx

       b: e9 00 00 00 00                jmp     0x10 <f3+0x10>


But the tail call uses the jump location, because it can't do anything else.

You can try to mark the CallUni instructions as hasDelaySlot = 1 or stop
>> treating it as a call instruction in NVPTXInstrInfo.td. Can't say which one
>> is better/correct. Looks like the representation for call/call.uni is not
>> quite compatible with the debug info
>>
> David suggested not allowing -fdebug-default-version=5 to override the
> DWARF version provided by NVPTX back-end. That would make sense,
> considering that we can't handle the newer DWARF versions anyways.
>
> Let me see if I can fix the lineinfo generation first.
>

Yeah, -gdwarf-5 and -fdebug-default-version=5 -g should behave the same. So
whatever it is that's disabling/downgrading to DWARFv2 for NVPTX when the
user uses -gdwarf-5 shoudl do the same for -fdebug-default-version=5


>
> --Artem
>
>> -------------
>> Best regards,
>> Alexey Bataev
>>
>> 12/2/2020 6:00 PM, Artem Belevich пишет:
>>
>> Hi, Alexey!
>>
>> I've ran into an odd case with debug info generation in NVPTX.
>>
>> Reproduction:
>> ------------------------
>> __device__ __attribute__((noinline)) void bar() { printf("Hi!"); }
>> __global__ void foo() { bar(); }
>> int main(){}
>> ------------------------
>>
>> $ clang++ -v --cuda-gpu-arch=sm_70 --cuda-device-only -fdebug-default-version=5
>> a.cu -gmlt -O1 -c
>>
>> Compilation fails due to a syntax error reported by ptxas.
>> The reason for the error is that clang generates a label in the middle of
>> a `call.uni` instuction. E.g:
>>
>>         { // callseq 1, 0
>>         .reg .b32 temp_param_reg;
>>         call.uni
>> Ltmp14:
>>         _Z3barv,
>>         (
>>         );
>>         } // callseq 1
>>
>> The odd part is that we're only generating line info and there is no
>> DWARF in the generated PTX.
>> It appears that this behavior is triggered by `-dwarf-version=5` passed
>> to cc1.
>> Looks like another case where PTX syntax breaks DWARF generator
>> assumptions.
>>
>> It's possible to work around it with an additional `-Xarch_device
>> -fdebug-default-version=2`,
>> but I'd appreciate it if you could take a look and see if that could be
>> fixed.
>>
>> --
>> --Artem Belevich
>>
>>
>
> --
> --Artem Belevich
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20201203/8f9c329c/attachment-0001.html>


More information about the cfe-dev mailing list