[PATCH] Emit Clang version information into .comment section (LLVM's part of implementation) [PART 2]

Katya Romanova Katya_Romanova at playstation.sony.com
Tue Sep 24 17:48:02 PDT 2013


kromanova added you to the CC list for the revision "Emit Clang version information into .comment section (LLVM's part of implementation) [PART 2]".

GCC compiler (as well as many other compilers) record their own version information into the object file. However, Clang compiler is not currently doing this. 

Emitting compiler version information in an object file could be used for variety of reasons. For example: a developer is saying that he is using the compiler with the latest fix, but
he is claiming that the bug is still not fixed.  After many hours of investigation you discover that the developer used an older version of the compiler for building some of the files in the project because he forgot to change the value of an environment variable in one of his scripts. Both you and the developer lost several hours of work. With all the version information recorded in the final executable file, it would have been very easy to check which compilers were used to to build it. You might find many other examples where compilation information emitted in the produced object file could be very handy.


Some time ago there was discussion in the mailing list about embedding a compilation database into an object file.
http://clang-developers.42468.n3.nabble.com/RFC-Embedding-compilation-database-info-in-object-files-tt4033300.html
It's an excellent idea, but it requires a user to use a special tool to extract this information.

The implementation provided here is a "lightweight" version for embedding compilation information into the object file. This information is very easy to extract by using any standard object file dumper tool (readelf, objdump, etc).

In the implementation provided here only compiler version information was embedded in object (and assembly) file. This could be easily expanded. Adding any other compilation 
information into the object file (i.e. compilation command line) could be fitted into the provided framework. 


Two different solutions were considered for emitting compilation information into the object file.

=============
    The first solution: 
=============

 Use module-level asm to generate .ident directives.
This is based on how DragonEgg generates .ident entries.
It outputs module-level asm statement for the .ident directive.
E.g.: module asm "\09.ident\09\22GCC: (GNU) ..."

void CodeGenModule::Release() {
+  EmitVersionIdent(); 
 EmitDeferred();
 EmitCXXGlobalInitFunc();

+ void CodeGenModule::EmitVersionIdent() {
+   std::string Version = getClangFullVersion();
+   const char *ident_asm_op = "\t.ident\t";
+   std::string Directive(ident_asm_op);
+   Directive += "\"" + Version + "\"";
+   TheModule.setModuleInlineAsm(Directive);
+ }

As you can see above from a very small patch above, the appeal of this solution is that it is very easy to implement. However, it's not clear how this will work with non-elf targets. Also, this solution doesn't provide a flexible framework for extensions (i.e implementation of #ident directive).

=================
    The second solution: 
=================
Emit Clang's version as metadata (see the main patch). 
This solution seems to be much better than using module-level asm. There are many advantages:
 (1) The llvm.ident metadata could be useful for other consumers (anythings that reads LLVM IR will be able to identify the producer of that IR), 
 (2) It makes it easier/cleaner to handle targets that don't support .ident/.comment*. 
 (3) The llvm.ident metadata solution provides a clean interface for properly supporting the #ident directive. Right now Clang will just preprocess and ignore #ident, so solution #2 provides a nice base to actually implement this.

===========================================

Here are the additional information about the implementation of the second solution (using metadata):

Our goal is to get the get the Clang version into into the .comment section. The assembler can already handle putting things into the .comment section via the .ident directive.
This is how GCC outputs its version string into the .comment section:
.ident "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"

We should the same general approach. (Thus compiling with clang -S would produce an appropriate .ident entry in the assembly listing.)

There are 2 parts of this solution:

(1) Emit Clang's version as metadata.
(2) Teach the backend to output this metadata in an .ident directive and .comment section.

This patch covers only part #2 (i.e. LLMV part). Clang changes (part #1) were previously provided in separate patch (see "Emit Clang version information into .comment section (Clang's part of implementation [PART 1]"). 
 
===========
LLVM changes:
===========

(1)    Added a method EmitIdent() to the MCStreamer class.
       - This gets implemented by the MCAsmStreamer class to output the .ident directive when generating assembly code.
       - This gets implemented by the MCELFStreamer class to generate the ".comment" section in object files. The implementation of this function is modeled on existing code to parse asm ident directives in lib/MC/MCParser/ELFAsmParser.cpp. It uses an instance variable "SeenIdent" to track whether it is the first .ident directive seen or not. This is to handle the leading NULL-byte that is expected in the .comment section.
       - All of the other subclass implementations have empty or unreachable bodies. In these cases, the conventions of the existing  code in each subclass were followed. This is why some are empty, some use llvm_unreachable, some report_fatal_error, etc. There might be interest in supporting ".comment" section for COFF target (which is legal). I've left a "TODO" comment in WinCOFFStreamer.cpp for someone who cares to support it, but I haven't actually implemented it.        
       
(2)    Added a private method EmitModuleIdents() to the AsmPrinter class. This methods checks if the MCAsm target has the IdentDirective and if it is present then it calls EmitIdent for each llvm.ident metadata entry. The code allows for an arbitrary number of named llvm.ident entries of arbitrary sizes. For our purposes, there should only be 1 llvm.ident entry of size 1 - but it was easy to handle an arbitrary number of entries so I did (which is useful for eventually supporting GCC #ident directives)

(3)    Added hasIdentDirective() to MCAsmInfo. This returns True if .ident directive is supported on the target. Set to True for ELF targets and to False for all other targets.  COFF targets might want to support this directive. I left a "TODO" comment in MCAsmInfoCOFF.cpp file for someone who someone who wants to support it. I followed the pattern of some other true-for-elf target directives (like HasSingleParameterDotFile) which is to set the flag to true by default and then disable it for non-elf targets.

(4)    Changed MC/MCParser/ELFAsmParser.cpp::ParseDirectiveIdent() to call getStreamer().EmitIdent(). This does the exact same thing as before, just that now the code to create the .comment section is in MCElfStreamer.

(5) Tests:
The following new test was added:
- llvm/test/CodeGen/X86/sce-ident-metadata.ll 
It verifies that (1) .ident directives are generated for all llvm.ident metadata entries. (2) .comment sections are generated from the llvm.ident metadata entries.

=================================
Side notes about the GCC #ident directive:
=================================
GCC has a directive #ident which allows you to create your own .ident entries. Example:
#ident "Clang is cool"

Would result in gcc generating:
.ident "clang is cool"
.ident "GCC: (GNU) 4.9.0 20130424 (experimental)"

Note, some versions of gcc print a message that #ident is deprecated. However, this seems to be a bug (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41632). At
any rate, my svn version of GCC does not generate a deprecated warning.

Additionally, GCC has an option "-fno-ident". It does 2 things:
- Causes #ident to be ignored
- Disables the generate of an .ident directive with the GCC version string.

Currently Clang does not have the -fno-ident directive. It would be easy to add. Clang accepts but silently drops #ident directives. This patch provides a basis to fully implement #ident, but doesn't actually implement this yet.

As far as I can tell, #ident is somewhat of a historical extension. I have no clue whether anyone cares about it anymore.

This patch neither implements #ident nor -fno-ident. However, it does provide a basis for implementing them.

http://llvm-reviews.chandlerc.com/D1729

Files:
  test/CodeGen/X86/ident-metadata.ll
  include/llvm/MC/MCELFStreamer.h
  include/llvm/MC/MCStreamer.h
  include/llvm/MC/MCAsmInfo.h
  include/llvm/CodeGen/AsmPrinter.h
  tools/lto/LTOModule.cpp
  lib/Target/NVPTX/MCTargetDesc/NVPTXMCAsmInfo.cpp
  lib/MC/MCAsmStreamer.cpp
  lib/MC/MCAsmInfoCOFF.cpp
  lib/MC/MCAsmInfo.cpp
  lib/MC/MCAsmInfoDarwin.cpp
  lib/MC/MCELFStreamer.cpp
  lib/MC/MCNullStreamer.cpp
  lib/MC/WinCOFFStreamer.cpp
  lib/MC/MCPureStreamer.cpp
  lib/MC/MCMachOStreamer.cpp
  lib/MC/MCParser/ELFAsmParser.cpp
  lib/CodeGen/AsmPrinter/AsmPrinter.cpp
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D1729.1.patch
Type: text/x-patch
Size: 14314 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130924/e3cf971f/attachment.bin>


More information about the llvm-commits mailing list