[PATCH] Initial commit for the llvm-dsymutil tool.

Wed Nov 12 19:17:54 PST 2014

Thanks for the initial review! Some explanations and further questions inline:

================
Comment at: tools/dsymutil/DebugMap.h:48
@@ +47,3 @@
+class DebugMapObject {
+  std::string Filename;
+  llvm::object::OwningBinary<llvm::object::ObjectFile> File;
----------------
kledzik wrote:
> It looks like "Filename" is the same as ObjectFile getFileName().
> 
> The existing debug map tracks the source (ex: /path/foo.c) path as well as the object (ex: /path/foo.o) path.  Does this DebugMap need both? or just the object path? 
> 
I think it only needs the object path, I haven't seen the source entries of the debug map being used except for detecting the end-of-scope of the current object file. The source file doesn't make much sense anyway as IIRC it is the parent of the object file scope and in an LTO scenario you end up with only one source file.

================
Comment at: tools/dsymutil/DebugMap.h:49
@@ +48,3 @@
+  std::string Filename;
+  llvm::object::OwningBinary<llvm::object::ObjectFile> File;
+
----------------
kledzik wrote:
> The current debug map also records the last-mod-time of the object file.  This allows whoever is processing the debug map to notice if the .o file has  changed out from under it.
I'll add that in a followup commit

================
Comment at: tools/dsymutil/DebugMap.h:52-56
@@ +51,7 @@
+  struct Symbol {
+    enum SymTy : uint8_t {
+      Unknown,
+      Variable,
+      Function,
+    };
+
----------------
kledzik wrote:
> Why is this needed?  
I'm not sure I'll use that actual type, but it seemed like some handy information to have (even if only for sanity checking that I'm not patch code addresses with data information). I actually use the Unknown type as an indicator that the Symbol hasn't been inserted in the map yet and I assert if a client tries to add 2 symbols with the same name in the same object file.

================
Comment at: tools/dsymutil/DebugMap.h:58-59
@@ +57,4 @@
+
+    uint64_t OrigAddress;
+    uint64_t Address;
+    SymTy Type;
----------------
kledzik wrote:
> Are these the address in the .o file and address in the final binary? Can the names be clearer.
> 
> What is the rule for tentative definitions which don't have an address in the .o file?  Maybe OrigAddress could be an llvm::Optional<>?
I agree that the names could be clearer.

What do you mean with "tentative definitions"? The OrigAddress gets filled with the value from the object file symbol table.

================
Comment at: tools/dsymutil/DebugMap.h:61
@@ +60,3 @@
+    SymTy Type;
+    bool Used;
+  };
----------------
kledzik wrote:
> What is "Used"? When would it be false?
As you point out bellow, I add every symbol from the object file symbol table to the list of symbols. Then the ones that are present in the binary's debug map are marked Used. 

I don't think libObject has a fast name -> symbol lookup, but I could have missed it. If such a feature exists, then I can populate the map with only symbols from the debug map and drop the used flag. An alternative would be to use a temporary StringMap with the whole object file symbol table in it, but this seemed a bit wasteful as I expect most of the object file symbol table to end up in the debug map anyway.

================
Comment at: tools/dsymutil/DebugMap.h:72-75
@@ +71,6 @@
+
+  void addSymbol(llvm::StringRef Name, Symbol::SymTy Type,
+                 uint64_t OrigAddress);
+
+  bool relocateSymbol(llvm::StringRef Name, uint64_t Address);
+
----------------
kledzik wrote:
> I'm trying to think how lld could build a DebugMapObject in memory.  lld would need to make an ObjectFile for each input, then for each atom in the output it would call addSymbol() and relocateSymbol()?  Can we add an optional final-address to addSymbol() so lld does not need to call relocateSymbol.
> 
Of course adding a addSymbol(name, objectAddress, binaryAddress) is fine. The current API requires the OwningBinary to be created by the client, and that is bad. As I pointed out in the review message, we need to move away from that scheme because it might simply not be possible to hold all the objects in memory together. I think the final API will just use strings as object file identifiers (allowing the lib.a(objectfile) syntax for archives), and some smart OwningBinary caching will be performed in the DebugMap object or in the DwarfLInker itself. 

================
Comment at: tools/dsymutil/DebugMap.h:80
@@ +79,3 @@
+private:
+  void createSymbols();
+};
----------------
kledzik wrote:
> Hmm.  It looks like you create a Symbol for every entry in the .o file even if it is not in the final linked image?  Why not start with the STABS debug map and only create those symbols, then look for them in the original .o file.
> 
> Also, lld would not want createSymbols() be be called because it will be populating the map.
I commented on that above. The map will be populated, but with unused symbols which won't be retained if they are not marked used by a relocateSymbol(). Or am I missing your point about the lld usecase?

http://reviews.llvm.org/D6242