[llvm-dev] [RFC] llvm-diva - Debug Information Visual Analyzer

Eric Christopher via llvm-dev llvm-dev at lists.llvm.org
Mon Aug 10 00:05:44 PDT 2020


Hi Carlos,

The tool sounds very interesting. I appreciate the use case documentation -
I think I'm going to want to take some closer looks at it to really get a
feel for the output. That said, I have no objections to this being accepted
into the project pending review.

Couple of questions:

a) Can you follow-up to this email with the actual phab links? I think they
didn't make it into the email.
b) Are you open to renaming? I'd prefer to avoid names that, while clever,
map to particular words if possible.

Thanks! I'm really excited by this work.

-eric

On Sun, Aug 9, 2020 at 9:51 PM Enciso, Carlos via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> llvm-diva - Debug Information Visual Analyzer
>
> Carlos Alberto Enciso, Sony Interactive Entertainment
>
>
>
> LLVM supports multiple debug information formats (namely DWARF and
> CodeView)
>
> in different binary formats (e.g. ELF, PDB, Mach-O). Understanding the
> mappings
>
> between source code and debug information can be complex, and it's a
> problem
>
> we've commonly encountered when triaging debug information issues.
>
>
>
> The output from tools such as llvm-dwarfdump or llvm-readobj use a close
>
> representation of the internal debug information format and in our
> experience
>
> we've found that they require a good knowledge of those formats to
> understand
>
> the output, limiting who can triage and address such issues quickly. Even
> for
>
> the experts, it can sometimes take a lot of time and effort to triage
> issues
>
> due to the inherent complexity.
>
>
>
> =========
>
> llvm-diva
>
> =========
>
>
>
> At Sony, we've been developing an LLVM-based debug information analysis
> tool
>
> which we've called llvm-diva (short for LLVM debug information visual
> analyzer),
>
> designed to visualize these mappings. It's based entirely on the existing
> LLVM
>
> libraries for debug info parsing, target support, etc. and at this stage we
>
> believe that its proven its worth internally to the point where we would
> like
>
> to propose upstreaming it as part of the mainline LLVM project alongside
>
> existing tools such as llvm-dwarfdump.
>
>
>
> llvm-diva is a command line tool that process debug info contained in a
> binary
>
> file produces a debug information format agnostic "Logical View", which is
> a
>
> high-level semantic representation of the debug info, independent of the
>
> low-level format.
>
>
>
> The logical view is composed of the tradition programming elements as:
> scopes,
>
> types, symbols, lines. These elements can display additional information,
> such
>
> as variable coverage factor, lexical block level, disassembly code, code
>
> ranges, etc.
>
>
>
> The diversity of llvm-diva command line options enables the creation of
> very
>
> rich logical views to include more low-level debug information:
>
> disassembly code associated with the debug lines, variables runtime
> location
>
> and coverage, internal offsets for the elements within the binary file,
> etc.
>
>
>
> With llvm-diva, we aim to address the following points:
>
>
>
> * Which variables are dropped due to optimization?
>
>
>
> * Why I can't stop at a particular line?
>
>
>
> * Which lines are associated to a specific code range?
>
>
>
> * Does the debug information represent the original source?
>
>
>
> * What is the semantic difference between the debug info generated by
> different
>
>   toolchain versions?
>
>
>
> =============
>
> Printing Mode
>
> =============
>
>
>
> In this mode llvm-diva prints the logical view or portions of it, based on
>
> criteria patterns (including regular expressions) to select the kind of
> logical
>
> elements to be included in the output.
>
>
>
> The below example is used to show different output generated by llvm-diva.
>
> We then compiled it for an x86 elf target with a recent version of clang
> (-O0
>
> -g):
>
>
>
> 1  using INTPTR = const int *;
>
> 2  int foo(INTPTR ParamPtr, unsigned ParamUnsigned, bool ParamBool) {
>
> 3    if (ParamBool) {
>
> 4      typedef int INTEGER;
>
> 5      const INTEGER CONSTANT = 7;
>
> 6      return CONSTANT;
>
> 7    }
>
> 8    return ParamUnsigned;
>
> 9  }
>
>
>
> Print basic details
>
> -------------------
>
>
>
> The following command prints basic details for the all logical elements
> sorted
>
> by the debug information internal offset; it includes its lexical level.
> Each
>
> row represents some element that is present within the debug information.
> The
>
> first column represents the scope level, followed by the associated line
> number
>
> (if any), and finally the description of the element.
>
>
>
> llvm-diva --sort=offset
>
>           --attribute=level
>
>           --print=scopes,symbols,types,lines
>
>           test.o
>
>
>
> Logical View:
>
>
>
> [000]           {File} 'test.o'
>
> [001]             {CompileUnit} 'test.cpp'
>
> [002]     2         {Function} extern not_inlined 'foo' -> 'int'
>
> [003]     2           {Parameter} 'ParamPtr' -> 'INTPTR'
>
> [003]     2           {Parameter} 'ParamUnsigned' -> 'unsigned int'
>
> [003]     2           {Parameter} 'ParamBool' -> 'bool'
>
> [003]                 {Block}
>
> [004]     5             {Variable} 'CONSTANT' -> 'const INTEGER'
>
> [004]     5             {Line}
>
> [004]     6             {Line}
>
> [003]     4           {TypeAlias} 'INTEGER' -> 'int'
>
> [003]     2           {Line}
>
> [003]     3           {Line}
>
> [003]     8           {Line}
>
> [003]     8           {Line}
>
> [003]     9           {Line}
>
> [002]     1         {TypeAlias} 'INTPTR' -> '* const int'
>
> [002]     9         {Line}
>
>
>
> Looking at the output we can see that it shows the semantics of the debug
>
> information but decoupled from the underlying DWARF representation.
>
>
>
> On closer inspection, we can see what could be a potential debug issue:
>
>
>
> [003]                 {Block}
>
> [003]     4           {TypeAlias} 'INTEGER' -> 'int'
>
>
>
> The 'INTEGER' definition is at level [003], the same lexical scope as the
>
> anonymous {Block} ('true' branch for the 'if' statement) whereas in the
>
> original source code the typedef statement is clearly inside that block,
> so the
>
> 'INTEGER' definition should also be at level [004] inside the block.
>
>
>
> Select logical elements
>
> -----------------------
>
>
>
> This feature allow selecting specific logical elements; the patterns used
> as
>
> criteria can include regular expressions. The output layout is controlled
> by
>
> the '--report' option to have a tabular report, a tree view showing the
>
> parents hierarchy for the logical element that matches the criteria, or
> just a
>
> summary with the number of occurrences.
>
>
>
> The following prints all symbols and types that contain 'inte' in their
> names
>
> or types, using a tab layout and given the number of matches.
>
>
>
> llvm-diva --select-nocase --select-regex --report=details,summary
>
>           --select=INTe
>
>           --attribute=level --print=symbols,types,instructions
>
>           test.o
>
>
>
> Logical View:
>
>
>
> [000]           {File} 'test.o'
>
> [003]     4     {TypeAlias} 'INTEGER' -> 'int'
>
> [004]     5     {Variable} 'CONSTANT' -> 'const INTEGER'
>
>
>
> -----------------------------
>
> Element      Total      Found
>
> -----------------------------
>
> Scopes           4          0
>
> Symbols          4          1
>
> Types            2          1
>
> Lines           16          0
>
> -----------------------------
>
> Total           26          2
>
>
>
> ===============
>
> Comparison Mode
>
> ===============
>
>
>
> In this mode llvm-diva compares logical views to produce a report with the
>
> logical elements that are missing or added. We've found this a very
> powerful
>
> aid in finding semantic differences in the debug information produced by
>
> different toolchain versions or even completely different toolchains
> altogether
>
> (For example a compiler producing DWARF can be directly compared against a
>
> completely different compiler that produces CodeView).
>
>
>
> There are 2 comparison methods: logical view and logical elements. The
> first
>
> one compares the logical view as a whole unit; for a match, each compared
>
> logical element must have the same parents and children. The second one
>
> compares individual logical elements without considering if their parents
> are
>
> the same. For both comparison methods, the equal criteria includes the
> name,
>
> source code location, type, lexical scope level.
>
>
>
> Given our previous example we found the above debug information issue
> (related
>
> to the previous invalid scope location for the 'typedef int INTEGER') by
>
> comparing against another compiler.
>
>
>
> 1  using INTPTR = const int *;
>
> 2  int foo(INTPTR ParamPtr, unsigned ParamUnsigned, bool ParamBool) {
>
> 3    if (ParamBool) {
>
> 4      typedef int INTEGER;
>
> 5      const INTEGER CONSTANT = 7;
>
> 6      return CONSTANT;
>
> 7    }
>
> 8    return ParamUnsigned;
>
> 9  }
>
>
>
> Using GCC to generate test-gcc.o, we can apply a selection pattern with the
>
> printing mode to obtain the following output.
>
>
>
> llvm-diva --select-regex --select-nocase --report=details
>
>           --select=INTe
>
>           --attribute=level
>
>           --print=symbols,types
>
>           test.o test-gcc.o
>
>
>
> Logical View:
>
> [000]           {File} 'test.o'
>
> [003]     4     {TypeAlias} 'INTEGER' -> 'int'
>
> [004]     5     {Variable} 'CONSTANT' -> 'const INTEGER'
>
>
>
> Logical View:
>
> [000]           {File} 'test-gcc.o'
>
> [004]     4     {TypeAlias} 'INTEGER' -> 'int'
>
> [004]     5     {Variable} 'CONSTANT' -> 'const INTEGER'
>
>
>
> The output shows that both objects contain the same elements. But the
>
> 'typedef INTEGER' is located at different scope level. The GCC generated
>
> object, shows '4', which is the correct value.
>
>
>
> Note that there is no requirement that GCC must produce identical or
> similar
>
> DWARF to clang in this case to allow the comparison. We're only comparing
> the
>
> semantics.
>
>
>
> Using the llvm-diva comparison functionality, that issue can be seen in a
> more
>
> global context, that can include the logical view.
>
>
>
> llvm-diva --compare=types --report=details,summary
>
>           --attribute=level
>
>           --print=symbols,types
>
>           test.o test-gcc.o
>
>
>
> Reference: 'test.o'
>
> Target:    'test-gcc.o'
>
>
>
> (1) Missing Types:
>
> -[003]     4     {TypeAlias} 'INTEGER' -> 'int'
>
>
>
> (1) Added Types:
>
> +[004]     4     {TypeAlias} 'INTEGER' -> 'int'
>
>
>
> ----------------------------------------
>
> Element   Expected    Missing      Added
>
> ----------------------------------------
>
> Scopes           4          0          0
>
> Symbols          0          0          0
>
> Types            2          1          1
>
> Lines            0          0          0
>
> ----------------------------------------
>
> Total            6          1          1
>
>
>
> The output shows in tabular form the missing (-), added (+) elements,
> giving
>
> more context by swapping the reference and target object files.
>
>
>
> llvm-diva --compare=types --report=view
>
>          --attribute=level
>
>           --print=symbols,types
>
>           test.o test-gcc.o
>
>
>
> Reference: 'test.o'
>
> Target:    'test-gcc.o'
>
>
>
> Logical View:
>
> [000]           {File} 'test.o'
>
> [001]             {CompileUnit} 'test.cpp'
>
> [002]     1         {TypeAlias} 'INTPTR' -> '* const int'
>
> [002]     2         {Function} extern not_inlined 'foo' -> 'int'
>
> [003]                 {Block}
>
> [004]     5             {Variable} 'CONSTANT' -> 'const INTEGER'
>
> +[004]     4             {TypeAlias} 'INTEGER' -> 'int'
>
> [003]     2           {Parameter} 'ParamBool' -> 'bool'
>
> [003]     2           {Parameter} 'ParamPtr' -> 'INTPTR'
>
> [003]     2           {Parameter} 'ParamUnsigned' -> 'unsigned int'
>
> -[003]     4           {TypeAlias} 'INTEGER' -> 'int'
>
>
>
> The output shows the merging view path (reference and target) with the
> missing
>
> and added elements.
>
>
>
> Comparing toolchains
>
> --------------------
>
>
>
> In the previous section, we compared GCC and Clang. The current
> implementation
>
> of llvm-diva have sufficient support for CodeView format, making possible
> the
>
> comparison between MSVC and Clang compilers.
>
>
>
> -----------------------------------------------------------------------
>
> pr_44884.cpp
>
> -----------------------------------------------------------------------
>
> 1  int bar(float Input) { return (int)Input; }
>
> 2
>
>  3  unsigned foo(char Param) {
>
> 4    typedef int INT;                      // ** Definition for INT **
>
> 5    INT Value = Param;
>
> 6    {
>
> 7      typedef float FLOAT;                // ** Definition for FLOAT **
>
> 8      {
>
> 9        FLOAT Added = Value + Param;
>
> 10        Value = bar(Added);
>
> 11      }
>
> 12    }
>
> 13    return Value + Param;
>
> 14  }
>
>
>
> The above test (from PR44884) is used to illustrates a scope issue found in
>
> the Clang compiler.
>
>
>
> See: https://bugs.llvm.org/show_bug.cgi?id=44884
>
>
>
> The lines 4 and 7 contains 2 typedefs, defined at different lexical scopes.
>
> 4    typedef int INT;
>
> 7      typedef float FLOAT;
>
>
>
> These are the logical views that llvm-diva generates for 3 different
> compilers
>
> (MSVC, Clang and GCC), emitting different debug info formats (CodeView,
> DWARF)
>
> on different platforms.
>
>
>
> -----------------------------------------------------------------------
>
> pr_44884_dw.o - Compiled with Clang (DWARF format).
>
> -----------------------------------------------------------------------
>
> Logical View:
>
> [000]           {File} 'pr_44884_dw.o' -> elf64-x86-64
>
> [001]             {CompileUnit} 'pr_44884.cpp'
>
> [002]               {Producer} 'clang version 11.0.0
>
> [002]     7         {Function} extern not_inlined 'bar' -> 'int'
>
> [003]     7           {Parameter} 'Input' -> 'float'
>
> [002]     9         {Function} extern not_inlined 'foo' -> 'unsigned int'
>
> [003]                 {Block}
>
> [004]    15             {Variable} 'Added' -> 'FLOAT'
>
> [003]     9           {Parameter} 'Param' -> 'char'
>
> [003]    13           {TypeAlias} 'FLOAT' -> 'float'
>
> [003]    10           {TypeAlias} 'INT' -> 'int'
>
> [003]    11           {Variable} 'Value' -> 'INT'
>
>
>
> -----------------------------------------------------------------------
>
> pr_44884_gc.o - Compiled with GCC (DWARF Format).
>
> -----------------------------------------------------------------------
>
> Logical View:
>
> [000]           {File} 'pr_44884_gc.o' -> elf64-x86-64
>
> [001]             {CompileUnit} 'pr_44884.cpp'
>
> [002]               {Producer} 'GNU C++ 5.5.0 20171010'
>
> [002]     7         {Function} extern not_inlined 'bar' -> 'int'
>
> [003]     7           {Parameter} 'Input' -> 'float'
>
> [002]     9         {Function} extern not_inlined 'foo' -> 'unsigned int'
>
> [003]                 {Block}
>
> [004]                   {Block}
>
> [005]    15               {Variable} 'Added' -> 'FLOAT'
>
> [004]    13             {TypeAlias} 'FLOAT' -> 'float'
>
> [003]     9           {Parameter} 'Param' -> 'char'
>
> [003]    10           {TypeAlias} 'INT' -> 'int'
>
> [003]    11           {Variable} 'Value' -> 'INT'
>
>
>
> -----------------------------------------------------------------------
>
> pr_44884_cv.o - Compiled with Clang (CodeView format).
>
> -----------------------------------------------------------------------
>
> Logical View:
>
> [000]           {File} 'pr_44884_cv.o' -> COFF-x86-64
>
> [001]             {CompileUnit} 'pr_44884.cpp'
>
> [002]               {Producer} 'clang version 11.0.0
>
> [002]               {Function} extern not_inlined 'bar' -> 'int'
>
> [003]                 {Parameter} 'Input' -> 'float'
>
> [002]               {Function} extern not_inlined 'foo' -> 'unsigned'
>
> [003]                 {Block}
>
> [004]                   {Variable} 'Added' -> 'float'
>
> [003]                 {Parameter} 'Param' -> 'char'
>
> [003]                 {TypeAlias} 'FLOAT' -> 'float'
>
> [003]                 {TypeAlias} 'INT' -> 'int'
>
> [003]                 {Variable} 'Value' -> 'int'
>
>
>
> -----------------------------------------------------------------------
>
> pr_44884_ms.o - Compiled with MSVC (CodeView Format).
>
> -----------------------------------------------------------------------
>
> Logical View:
>
> [000]           {File} 'pr_44884_ms.o' -> COFF-i386
>
> [001]             {CompileUnit} 'pr_44884.cpp'
>
> [002]               {Producer} 'Microsoft (R) Optimizing Compiler'
>
> [002]               {Function} extern not_inlined 'bar' -> 'int'
>
> [003]                 {Parameter} 'Input' -> 'float'
>
> [002]               {Function} extern not_inlined 'foo' -> 'unsigned'
>
> [003]                 {Block}
>
> [004]                   {Block}
>
> [005]                     {Variable} 'Added' -> 'float'
>
> [004]                   {TypeAlias} 'FLOAT' -> 'float'
>
> [003]                 {Parameter} 'Param' -> 'char'
>
> [003]                 {TypeAlias} 'INT' -> 'int'
>
> [003]                 {Variable} 'Value' -> 'int'
>
>
>
> From the previous logical views, we can see that the Clang compiler emits
> both
>
> typedefs at the same lexical scope (3), which is wrong, while GCC and MSVC
> emit
>
> correct lexical scope for both typedefs.
>
>
>
>
> ---------+----------+----------------------------------------------------------
>
> Compiler | Format   | Lexical Scope
>
>
> ---------|----------|----------------------------------------------------------
>
> Clang    | DWARF    | [003]    13           {TypeAlias} 'FLOAT' -> 'float'
>
>          |          | [003]    10           {TypeAlias} 'INT' -> 'int'
>
>
> ---------|----------+----------------------------------------------------------
>
> GCC      | DWARF    | [004]    13             {TypeAlias} 'FLOAT' ->
> 'float'
>
>          |          | [003]    10           {TypeAlias} 'INT' -> 'int'
>
>
> ---------|----------|----------------------------------------------------------
>
> Clang    | CodeView | [003]                 {TypeAlias} 'FLOAT' -> 'float'
>
>          |          | [003]                 {TypeAlias} 'INT' -> 'int'
>
>
> ---------|----------|----------------------------------------------------------
>
> MSVC     | CodeView | [004]                   {TypeAlias} 'FLOAT' ->
> 'float'
>
>          |          | [003]                 {TypeAlias} 'INT' -> 'int'
>
>
> ---------+----------+----------------------------------------------------------
>
>
>
> Note: One of the main limitations while processing CodeView debug info, is
> the
>
> reduced line information emitted for types and symbols, making difficult
> to use
>
> the comparison feature within llvm-diva, as the line numbers are one of the
>
> criteria for logical element match. In the meantime, any graphical
> comparison
>
> tool is able to compare and show the logical view differences.
>
>
>
> The above table shows the omitted line numbers for the referenced typedefs.
>
>
>
> ==============
>
> Current Status
>
> ==============
>
>
>
> Generates complete logical views for DWARF including:
>
> - Scopes, symbols, types, lines.
>
> - Variable location, coverage and location gaps.
>
> - Disassembly of text sections associated with .debug_line records.
>
> - Emission of warnings for invalid ranges, lines with line zero.
>
> - Comparison: logical views and elements.
>
>
>
> Generates partial logical views for COFF/CodeView (objects and PDB),
> including:
>
> - Scopes, symbols, types, lines.
>
> - Comparison: logical views and elements.
>
>
>
> During the development of llvm-diva, we have found the following LLVM debug
>
> issues:
>
>
>
> - PR43860 - COFF Debug info shows variable at the wrong lexical scope
>
> - PR43905 - COFF Debug info missing nested enumeration
>
> - PR44884 - Debug information shows incorrect lexical scope for typedef
>
> - PR46361 - [CodeView] Omitted class member function declaration for lambda
>
> - PR46394 - [CodeView] Missing LF_NESTTYPE with nested templates
>
>
>
> ==============
>
> Work remaining
>
> ==============
>
>
>
> The following are the main tasks that needs to be finished:
>
> - Logical View in JSON format. Currently it uses free form text style.
>
> - COFF/CodeView disassembly text sections.
>
> - COFF/CodeView ranges and locations.
>
>
>
> ==========
>
> Conclusion
>
> ==========
>
>
>
> The source code has been uploaded for review on phabricator at this link:
>
>
>
> https://reviews.llvm.org/Dxxxx.
>
>
>
> The review covers two patches:
>
>
>
> A first patch with a IntervalTree data structure implementation which is
>
> required by llvm-diva.
>
>
>
> A second patch with the actual tool (in llvm/tools/llvm-diva).
>
>
>
> Once these first two patches are committed, the plan is to keep working on
>
> llvm-diva with the help of the community to address current limitations and
>
> find good solutions/fixes for any design issues.
>
>
>
> We hope the community will find llvm-diva useful like we have.
>
>
>
> Special thanks to Orlando Cazalet-Hyams by testing the tool and to Greg
> Bedwell,
>
> Phillip Power and Paul Robinson by suggesting improvements and reviewing
> the tool
>
> documentation.
>
>
>
> Thanks for your time.
>
>
>
> -Carlos
>
>
>
> **********************************************************************
> This email and any files transmitted with it are confidential and intended
> solely for the use of the individual or entity to whom they are addressed.
> If you have received this email in error please notify
> siee.postmaster at sony.com
> This footnote also confirms that this email message has been checked for
> all known viruses.
> Sony Interactive Entertainment Europe Limited
> Registered Office: 10 Great Marlborough Street, London W1F 7LP, United
> Kingdom
> Registered in England: 3277793
> **********************************************************************
>
> P* Please consider the environment before printing this e-mail*
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200810/23050ee9/attachment-0001.html>


More information about the llvm-dev mailing list