[llvm-dev] RFC: Debug info for Cuda

Tue Nov 7 08:04:35 PST 2017

Hi, Alexey,

Thanks a bunch for working on this. A couple quick questions below...

On 11/06/2017 12:37 PM, Alexey Bataev wrote:
>
> Hi everybody,
>
> As you know, Cuda/NVPTX target has very limited support of the debug 
> info in Clang/LLVM. Currently, LLVM supports only emission of the line 
> numbers debug info.
>
> This is caused by limitations of the Cuda/NVPTX codegen. Clang/LLVM 
> translates the source code to LLVM IR, which is then lowered to PTX 
> (parallel thread execution) intermediate file. This PTX file 
> represents special kind of the assembler code in text format, which 
> contains the code itself + (possibly) debug info. Then this PTX file 
> is compiled by ptxas tool into the CUDA binary representation.
>
>
> Debug info representation in PTX file.
> ========================
>
> According to PTX Writer's Guide to Interoperability, Debug information 
> (http://docs.nvidia.com/cuda/ptx-writers-guide-to-interoperability/index.html#debug-information) 
> , debug information must be encoded in DWARF (Debug With Arbitrary 
> Record Format). The responsibility for generating debug information is 
> split between the PTX producer and the PTX-to-SASS backend. The PTX 
> producer is responsible for emitting binary DWARF into the PTX file, 
> using the .section and .b8-.b16-.b32-and-.b64 directives in PTX. This 
> should contain the .debug_info and .debug_abbrev sections, and 
> possibly optional sections .debug_pubnames and .debug_aranges. These 
> sections are standard DWARF2 sections that refer to labels and 
> registers in the PTX.
>
> The PTX-to-SASS backend is responsible for generating the .debug_line 
> section from the .file and .loc directives in the PTX file. This 
> section maps source lines to SASS addresses. The PTX-to-SASS backend 
> also generates the .debug_frame section.
>
> LLVM is able to emit debug info in DWARF. But ptxas compiler has some 
> limitations, that make it hard to adapt LLVM for correct emission of 
> the debug info in PTX files.
>
>
> Limitations/features of the PTX format/ptxas compiler.
> ==================================
>
> a) Supports DWARF-2 only.
> b) Labels are allowed only in code section (only in functions).
> c) Does not support label arithmetic in DWARF sections.
>     “.b32 L1 – L2” as the size of the section is not allowed, so the 
> sections sizes should be calculated explicitly.
> d) Debug info must point to the sections, not to labels inside these 
> sections.
>     “.b32 .debug_abbrevs”
> e) Sections itself must be enclosed into braces
>     “.section .debug_info {…}”
> f) Frame info is non-register based
>     Based on function local “__local_depot” array, that represents the 
> stack frame.
> g) All variables must have non-standard DW_AT_address_class attribute 
> so the debuger had the info about address class of the variable - 
> global or local. DWARF standard does support this attribute, but it 
> can be appiled to pointer/reference types only, not variables.
> h) The first label in the function must follow the debug location 
> macro. In LLVM, it is followed by the debug location macro.
> i) .debug_frame section is emitted by txas compiler.
>     DW_AT_frame_base must be set to dwarf::DW_FORM_data1 
> dwarf::DW_OP_call_frame_cfa value.
> j) Strings cannot be referenced by the labels, instead they must be 
> inlined in the sections in form of array of chars.
>
> Some changes in LLVM are required to support all these 
> limitation/features in the output PTX files.
>
> Required changes in LLVM.
> ==================
>
> •include/llvm/CodeGen/AsmPrinter.h.
>     •Add “virtual MCSymbol *getFunctionFrameSymbol(const 
> MachineFunction *MF) const” for non-register-based frame info.
>     •Override “NVPTXMCAsmPrinter.cpp” to return the name of the 
> “__local_depot” frame storage.
> •Add ”cuda-gdb” specific tuning.
>     •Inlined strings must be used in sections, not string references.
>     •Label arithmetic is replaced by the absolute section size evaluation.
>     •Use “AsmPrinter::doInitialization()” instead of NVPTX-custom 
> manual initialization.
>     •Local variables address emitted as “__local_depot” + <var offset>.
> •Add NVPTX specific “NVPTXMCAsmStreamer” class.
>     •Requires moving to includes of “MCAsmStreamer” class declaration.
>     •Overrides emission of the labels (names of the section are 
> emitted instead).
>     •Overrides emission of the sections (emit braces)
>     •Overrides string emission (as sequence of bytes, not as strings)
>     •Overrides emission of files/locations debug info
>
> Required changes in Clang.
> =================
>
> •Add option “-gcuda-gdb” to driver.
>     •Emit cuda-gdb compatible debug info (DWARF-2 by default + CudaGDB 
> tuning).
> •Add options “-g --dont-merge-basicblocks --return-at-end” to “ptxas” 
> call.
>

Is this a change? It looks like this is already the behavior of the 
driver's CUDA target code:

   if (Args.hasFlag(options::OPT_cuda_noopt_device_debug,
                    options::OPT_no_cuda_noopt_device_debug, false)) {
     // ptxas does not accept -g option if optimization is enabled, so
     // we ignore the compiler's -O* options if we want debug info.
     CmdArgs.push_back("-g");
     CmdArgs.push_back("--dont-merge-basicblocks");
     CmdArgs.push_back("--return-at-end");
   } else if (Arg *A = Args.getLastArg(options::OPT_O_Group)) {
     // Map the -O we received to -O{0,1,2,3}.

>     •ptxas is able to translate debug information only if -O0 
> optimization level is used. It means, that we can use optimization 
> level in LLVM > O0, but still have to use O0 when calling ptxas compiler.
>

Can you clarify what "unable to translate" mean? Does it refuse to 
compile the code, drop all debug info, drop all debug info except for 
line-table information, something else?

Thanks again,
Hal

>
> This approach was implemented in https://github.com/clang-ykt to 
> support debug info emission for NVPTX target when generating code for 
> OpenMP offloading constructs. You can try to use it.
>

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171107/20bc37b2/attachment-0001.html>