[llvm-dev] RFC: Add a way to interleave source code in assembler output
Roger Ferrer Ibanez via llvm-dev
llvm-dev at lists.llvm.org
Fri Feb 3 08:31:13 PST 2017
Dear llvm/clang community,
I'm interested in adding a way to emit source code interleaved in the output of the assembler.
A feature that several compilers have and clang/llvm is missing is the possibility of interleaving source code in the assembler output (e.g. when using -S).
This feature is useful for a number of reasons. For those users who are concerned with the quality of the code, code size, debugging and inspection or analysis of the generated assembler.
An essential requirement of this feature is having location information at the point where the assembler code is emitted. Location information is currently not part of the instruction representation itself but instead is encoded as part of the debug information. This means that to have location information we need to make sure the FE is emitting some minimal amount of debugging information containing location. This is currently possible in clang using -gline-tables-only but other FE's might choose to emit this information under some other conditions.
I made an implementation which shows that the impact on the existing codebase is low.
Closing the gap between input source code and the generated instructions is important for users that are concerned about the correctness and quality of the generated code. This feature would help to reduce this gap by providing better context to the emitted instructions. Incidentally it can also help debugging wrong code.
- Related work
This is a feature commonly available in production compilers .
 https://gcc.gnu.org/gcc-7/changes.html (see "Other significant improvements" by the end of the document)
https://llvm.org/bugs/show_bug.cgi?id=17465 suggests some workarounds. A comment also points to a patch that I could not retrieve.
This proposal currently spans LLVM and clang.
-- clang/FE changes
For clang it would simply mean to add a flag like -fsource-asm or maybe extend the meaning of -fverbose-asm (like it will happen in GCC 7 but see some further comments below). This flag would make sure that the minimal amount of debug information is generated. Currently this means enabling -gline-tables-only in absence of any other debugging flag specified. A flag -masm-source for communicating the driver and cc1 will be added as well.
Other FE's can provide other specific mechanisms to enable source interleave.
-- llvm changes
For llvm I suggest creating a new AsmPrinterHandler called, tentatively, SourceInterlave that would be responsible of printing the lines related to the instructions. SourceInterleave would take care of loading the files and making sure the source code lines are emitted as comments.
This handler would be enabled through MCOptions (similar to what happens with AsmVerbose). The current option is tentatively called AsmSource.
Currently AsmPrinterHandler mechanism looks slightly geared towards debug information but it also used for EH. So I think using it for printing interleaved source is a good fit.
In case this proposal is positively received I would like to gather some feedback on the following items.
-- The name of the flag itself for clang
My current implementation uses -fsource-asm but maybe we want to integrate this feature in -fverbose-asm for this (as gcc 7 will do). I have no strong preference, but maybe overloading -fverbose-asm may have some undesirable consequences: recall that we need to enable some, even if minimal, debugging information in clang for this feature to be useable.
-- Enabling debug information causes debug information also to be emitted
This currently makes the output unnecessarily hard to read due basically to .loc directives.
Currently my implementation uses "-masm-source=1" and "-masm-source=2" for cc1 which is then communicated to the MCOption AsmSource. When AsmSource is not 1, debug is emitted as usual, otherwise only SourceInterleave is used.
"clang -fsource-asm" would pass "-masm-source=1". So only interleaved source would be printed, without the extra debug directives.
"clang -fsource-asm -g" (or any other debug enabling flag) would pass "-masm-source=2" extending the current behaviour of emitting debug information with interleaved source.
I think this is OK but maybe there is some subtlety regarding "having debug information around but not generating its directives" as it would happen under AsmSource==1.
Also -masm-source=1/-masm-source=2 are just stand-ins. Something a bit more explanatory like -masm-source=nodebug and -masm-source=debug can be used instead.
-- Would it make sense to map the "/FAs" flag of clang-cl to this feature as well?
I can't really answer this question because I am not sure what are the expectations of the clang-cl users in terms of closeness to VS's cl.exe behaviour.
Looking forward your feedback. I can put in phabricator the patches for my current implementation if this helps the discussion.
More information about the llvm-dev