[llvm-dev] RFC: Add a way to interleave source code in assembler output

John Reagan via llvm-dev llvm-dev at lists.llvm.org
Fri Feb 3 11:03:09 PST 2017


I want to jump in on this too.  For our porting of OpenVMS compilers 
(BASIC, COBOL, Pascal, Fortran, C, VAX Macro assembler, BLISS) to
using LLVM for our x86 port, we want to provide some scheme for our
traditional OpenVMS "listing files" which an optionally include generated
machine code, cross reference information, command line summary, frontend
generated messages interspersed with the source listing, display of header
file contents, expansion of macros, optimization information, inlining
heuristic results, etc.

My plan was to come back here with an RFC later this year after we have our
early cross-compilers in place for the OS team to do their porting effort.

I have some resources that could help with such an effort.

Besides using these listing for debugging efforts, we also use them for
system archive purposes.  Over the last 30+ years doing this, having a full
machine code listing from an older release is invaluable for debugging
system crashes, etc.  With such a listing file, you can see EXACTLY the
source that was compiled and the exact generated code all in a single file.
Our generated code contains symbolized variable and routine names along with
line number information.  And by having a qualifier summary at the tail end
of the listing file, you can always tell exactly which command line options
that were specified by the make files/command files/etc. that built the
software.

What I envision is some listing manager that is feed from various places in
the compiler.  The frontend (not just clang but any frontend) could provide
file information, source line information, error message information, command
line processing, etc.  The backend can provide optimization data, inline
decisions, generated code, diagnostics for uninitialized variables, unreachable
code, etc.  Then the listing manager would collect, sort, etc. and generate the
single listing file.  You might be able to cobble something together inside a
driver if you pick apart the output from various tools, but it wouldn't have the
look and feel from a single generated file.

As for interspersed machine code, that is often only useful with O0 or O1 compilations.
Once you get to O2 and higher, it is often just best to have the machine code
following the source code with line numbers (either as ".loc" directives or just with
"end of line" comments).  In many cases, we use the comment area to denote "interesting"
instructions in the prologue/epilogue that correspond to unwind information.

The traditional VMS listings are 132 columns wide with "^L" form feeds between
sections since they were originally designed to be printed on greenbar printer
paper (Google it if don't know what I'm talking about :) )

Here's a little abridged example for the traditional "hello world" C program.

              1 #include <stdio.h>
       X   1612 #if 0
       X   1613 An X is placed at the left to show this is eXcluded
       X   1614 #endif
           1615 #define m(p) int p;
      1    1616 main () {
      1    1617   m(ii);
       E           int ii ;
      1    1618   printf("hello world\n");
      1    1619 }
^L
                                Machine Code Listing             3-FEB-2017 14:01:37  VSI C V7.4-001-50L7J              Page 2
                                                                 3-FEB-2017 14:00:56  WORK20:[JREAGAN]HW.C;17

                                      .psect $CODE$, CON, LCL, SHR, EXE, NOWRT, NOVEC, NOSHORT
                                      .proc   __MAIN
                                      .align 32
                                      .global  __MAIN
                                      .personality  DECC$$SHELL_HANDLER
                                      .handlerdata  -8
                        __MAIN:                                                                                            // 001616
                           { .mii
002C009229C0     0000                 alloc   r39 = rspfs, 6, 3, 8, 0
0120000A0380     0001                 mov     r14 = 80
010800100A00     0002                 mov     r40 = gp ;;                       // r40 = r1
                           }
                           { .mib
010028E183C0     0010                 sub     r15 = sp, r14                     // r15 = r12, r14
000188000980     0011                 mov     r38 = rp                          // r38 = br0
004000000000     0012                 nop.b   0 ;;
                           }
.....
                                        +------------+
                                        | SYMBOL MAP |
                                        +------------+
Identifier name                 Line    Size    Aligned Storage Cl. Type
_______________                 ____    ____    _______ ___________ ____

DECC$RECORD_READ                1433    4       long    Extern      Function returning signed int
DECC$RECORD_WRITE               1434    4       long    Extern      Function returning signed int
FILE                            646     4       long                Typedef: short pointer to struct _iobuf
__FILE                          496     4       long                Typedef: short pointer to struct _iobuf
__FILE_ptr32                    497     4       long                Typedef: short pointer to short pointer to struct _iobuf
__caddr_t                       469     4       long                Typedef: short pointer to char
__char_ptr32                    499     4       long                Typedef: short pointer to char
__char_ptr64                    551     4       long                Typedef: short pointer to char
__char_ptr_const_ptr32          515     4       long                Typedef: short pointer to const short pointer to char
__char_ptr_const_ptr64          555     4       long                Typedef: short pointer to const short pointer to char
__char_ptr_ptr32                514     4       long                Typedef: short pointer to short pointer to char
__char_ptr_ptr64                554     4       long                Typedef: short pointer to short pointer to char
.....
^L
                                Source Listing                   3-FEB-2017 13:49:45  VSI C V7.4-001-50L7J              Page 10
                                                                 3-FEB-2017 13:49:28  WORK20:[JREAGAN]HW.C;16

CC/LIST/MACH/SHOW=SYMBOLS HW

Hardware: /ARCHITECTURE=GENERIC /OPTIMIZE=TUNE=GENERIC

These macros are in effect at the start of the compilation.
----- ------ --- -- ------ -- --- ----- -- --- ------------

 __G_FLOAT=0  __DECC=1  vms=1  VMS=1  __32BITS=1  __PRAGMA_ENVIRONMENT=1
 __ia64__=1  __CRTL_VER=80400000  __vms_version="V8.4-2  "  CC$gfloat=0
 __X_FLOAT=1  vms_version="V8.4-2  "  __DATE__="Feb  3 2017"
 __STDC_VERSION__=199901L  __DECC_MODE_RELAXED=1  __DECC_VER=70490001
 __VMS=1  VMS_VERSION="V8.4-2  "  __IEEE_FLOAT=1  __VMS_VERSION="V8.4-2  "
 __TIME__="13:49:45"  __ia64=1  __VMS_VER=80420022  __BIASED_FLT_ROUNDS=2
 __INITIAL_POINTER_SIZE=0  __STDC__=2  _IEEE_FP=1  __LANGUAGE_C__=1  __vms=1
 __D_FLOAT=0



> Message: 7
> Date: Fri, 3 Feb 2017 16:31:13 +0000
> From: Roger Ferrer Ibanez via llvm-dev <llvm-dev at lists.llvm.org>
> To: "cfe-dev at lists.llvm.org" <cfe-dev at lists.llvm.org>, llvm-dev
> 	<llvm-dev at lists.llvm.org>
> Cc: nd <nd at arm.com>
> Subject: [llvm-dev] RFC: Add a way to interleave source code in
> 	assembler	output
> Message-ID:
> 	<DB6PR0802MB2534F3C7B3B6A9FDF7C1E631874F0 at DB6PR0802MB2534.eurprd0
> 8.prod.outlook.com>
> 
> Content-Type: text/plain; charset="us-ascii"
> 
> Dear llvm/clang community,
> 
> I'm interested in adding a way to emit source code interleaved in the
> output of the assembler.
> 
> - Introduction
> 
> A feature that several compilers have and clang/llvm is missing is the
> possibility of interleaving source code in the assembler output (e.g.
> when using -S).
> 
> This feature is useful for a number of reasons. For those users who are
> concerned with the quality of the code, code size, debugging and
> inspection or analysis of the generated assembler.
> 
> An essential requirement of this feature is having location information
> at the point where the assembler code is emitted. Location information
> is currently not part of the instruction representation itself but
> instead is encoded as part of the debug information. This means that to
> have location information we need to make sure the FE is emitting some
> minimal amount of debugging information containing location. This is
> currently possible in clang using -gline-tables-only but other FE's
> might choose to emit this information under some other conditions.
> 
> I made an implementation which shows that the impact on the existing
> codebase is low.
> 
> - Rationale
> 
> Closing the gap between input source code and the generated
> instructions is important for users that are concerned about the
> correctness and quality of the generated code. This feature would help
> to reduce this gap by providing better context to the emitted
> instructions. Incidentally it can also help debugging wrong code.
> 
> - Related work
> 
> This is a feature commonly available in production compilers
> [1][2][3][4].
> 
> [1]
> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0472m/ch
> r1359124927770.html
> [2] https://gcc.gnu.org/gcc-7/changes.html (see "Other significant
> improvements" by the end of the document) [3]
> https://software.intel.com/en-us/node/523027
> [3] https://msdn.microsoft.com/en-us/library/367y26c6.aspx
> 
> https://llvm.org/bugs/show_bug.cgi?id=16647
> https://llvm.org/bugs/show_bug.cgi?id=17465 suggests some workarounds.
> A comment also points to a patch that I could not retrieve.
> 
> - Proposal
> 
> This proposal currently spans LLVM and clang.
> 
> -- clang/FE changes
> 
> For clang it would simply mean to add a flag like -fsource-asm or maybe
> extend the meaning of -fverbose-asm (like it will happen in GCC 7 but
> see some further comments below). This flag would make sure that the
> minimal amount of debug information is generated. Currently this means
> enabling -gline-tables-only in absence of any other debugging flag
> specified. A flag -masm-source for communicating the driver and cc1
> will be added as well.
> 
> Other FE's can provide other specific mechanisms to enable source
> interleave.
> 
> -- llvm changes
> 
> For llvm I suggest creating a new AsmPrinterHandler called,
> tentatively, SourceInterlave that would be responsible of printing the
> lines related to the instructions. SourceInterleave would take care of
> loading the files and making sure the source code lines are emitted as
> comments.
> 
> This handler would be enabled through MCOptions (similar to what
> happens with AsmVerbose). The current option is tentatively called
> AsmSource.
> 
> Currently AsmPrinterHandler mechanism looks slightly geared towards
> debug information but it also used for EH. So I think using it for
> printing interleaved source is a good fit.
> 
> - Discussion
> 
> In case this proposal is positively received I would like to gather
> some feedback on the following items.
> 
> -- The name of the flag itself for clang
> 
> My current implementation uses -fsource-asm but maybe we want to
> integrate this feature in -fverbose-asm for this (as gcc 7 will do). I
> have no strong preference, but maybe overloading -fverbose-asm may have
> some undesirable consequences: recall that we need to enable some, even
> if minimal, debugging information in clang for this feature to be
> useable.
> 
> -- Enabling debug information causes debug information also to be
> emitted
> 
> This currently makes the output unnecessarily hard to read due
> basically to .loc directives.
> 
> Currently my implementation uses "-masm-source=1" and "-masm-source=2"
> for cc1 which is then communicated to the MCOption AsmSource. When
> AsmSource is not 1, debug is emitted as usual, otherwise only
> SourceInterleave is used.
> 
> This way
>   "clang -fsource-asm" would pass "-masm-source=1". So only interleaved
> source would be printed, without the extra debug directives.
>   "clang -fsource-asm -g" (or any other debug enabling flag) would pass
> "-masm-source=2" extending the current behaviour of emitting debug
> information with interleaved source.
> 
> I think this is OK but maybe there is some subtlety regarding "having
> debug information around but not generating its directives" as it would
> happen under AsmSource==1.
> 
> Also -masm-source=1/-masm-source=2 are just stand-ins. Something a bit
> more explanatory like -masm-source=nodebug and -masm-source=debug can
> be used instead.
> 
> -- Would it make sense to map the "/FAs" flag of clang-cl to this
> feature as well?
> 
> I can't really answer this question because I am not sure what are the
> expectations of the clang-cl users in terms of closeness to VS's cl.exe
> behaviour.
> 
> Looking forward your feedback. I can put in phabricator the patches for
> my current implementation if this helps the discussion.
> 
> Kind regards,
> Roger





More information about the llvm-dev mailing list