[PATCH] D53379: GSYM symbolication format

Zachary Turner via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Oct 19 10:54:37 PDT 2018


zturner requested changes to this revision.
zturner added a comment.
This revision now requires changes to proceed.

I'll try to review more later.

The big one that will probably require a lot of chrun is that all the variable names need to be fixed to conform to LLVM style.  no `m_`, no underscores, everything is `CamelCase`.  Locals, function params, member variables, etc.



================
Comment at: include/llvm/DebugInfo/GSYM/DataRef.h:43
+//----------------------------------------------------------------------
+class DataRef {
+  StringRef Data;
----------------
This looks very similar to the `BinaryStreamRef` / `BinaryStreamReader` abstraction I created.  The way it works is that you would construct a `BinaryByteStream` with your data, then create a `BinaryStreamReader` to read from it (you can also create a `BinaryStreamReader` directly from an `ArrayRef` to save a line of code).  This separation of responsibilities is useful, because it allows you to have your data backed by a file, by a memory buffer, or in PDB land by something like a scatter/gather buffer where the next byte might not be at the next offset.  But it has the same property where it doesnt actually copy.  So if you say:

```
BinaryStreamReader Reader(Data);
const Header *hdr = nullptr;
if (auto Err = Reader.readObject(hdr)) {
  // Not enough data, fail gracefully.
}
// Now hdr points into the buffer, no copy happened.
```

Could you try seeing if this abstraction works for your use case?  It is already extensively tested and would allow you to delete a bunch of code.


================
Comment at: include/llvm/DebugInfo/GSYM/DwarfTransformer.h:29
+  DwarfTransformer &setNumThreads(uint32_t num) {
+    m_numThreads = num;
+    return *this;
----------------
Make sure to use llvm naming conventions for everything.  Member variables are just `NumThreads`, local variables are the same.  e.g. `Num`


================
Comment at: include/llvm/DebugInfo/GSYM/DwarfTransformer.h:33
+
+  DwarfTransformer &setErrorBanner(const std::string &banner) {
+    m_errorBanner = banner;
----------------
Normally I would say `StringRef Banner` instead of a `const std::string&`, but you just assign it to a member variable which is a `std::string`.  So this is a good candidate for pass-by-value `std::string`.  This way you can just write this function as `ErrorBanner = std::move(Banner);`  With either a `StringRef` or `const std::string&` solution, you have a minimum of 1 copy to assign it to the member variable, but with a pass by value you  opens the possibility of a zero-copy solution, if the caller writes `Transformer.setErrorBanner(std::move(Banner));`

That said, see comment below about whether these member functions are necessary at all.


================
Comment at: include/llvm/DebugInfo/GSYM/DwarfTransformer.h:38-41
+  DwarfTransformer &setOutStream(llvm::raw_ostream *stream) {
+    m_outStream = stream;
+    return *this;
+  }
----------------
Is there any reason to want to set the out stream, error banner, number of threads, etc, on an alraedy constructed `DwarfTransformer`?  Perhaps they could just be passed to the constructor?


================
Comment at: include/llvm/DebugInfo/GSYM/DwarfTransformer.h:43
+
+  bool loadDwarf(llvm::object::ObjectFile &obj);
+  bool loadDwarf(const std::string &filename) {
----------------
zturner wrote:
> Instead of returning `bool, can we return `llvm::Error`?
I think this can be a `const&`


================
Comment at: include/llvm/DebugInfo/GSYM/DwarfTransformer.h:43-57
+  bool loadDwarf(llvm::object::ObjectFile &obj);
+  bool loadDwarf(const std::string &filename) {
+    if (auto binary = getObjectFile(filename)) {
+      return loadDwarf(*binary.getValue().getBinary());
+    }
+    return false;
+  }
----------------
Instead of returning `bool, can we return `llvm::Error`?


================
Comment at: include/llvm/DebugInfo/GSYM/DwarfTransformer.h:44
+  bool loadDwarf(llvm::object::ObjectFile &obj);
+  bool loadDwarf(const std::string &filename) {
+    if (auto binary = getObjectFile(filename)) {
----------------
`StringRef FileName`


================
Comment at: include/llvm/DebugInfo/GSYM/DwarfTransformer.h:64-67
+  void handleDie(std::shared_ptr<StringTableCreator> strtab,
+                 std::shared_ptr<FileTableCreator> filetab, CUInfo &cuInfo,
+                 llvm::raw_ostream *OS, llvm::DWARFDie die,
+                 std::function<void(FunctionInfo &&)> insertFunc);
----------------
It seems like we don't need these `shared_ptr`s at all, we can just use `StringTableCreator&` etc.  Similarly for the `raw_ostream`, can it be a reference?  I looked at the implementation and null doesn't seem like a possibility, so let's just enforce that by using references (plus, copying `shared_ptr` is expensive)

Also, can you use `llvm::unique_function<>` instead of `std::function<>`.  The functional difference is almost never important, and it performs slightly better.


================
Comment at: include/llvm/DebugInfo/GSYM/DwarfTransformer.h:70
+  bool parseLineTable(FileTableCreator &filetab, CUInfo &cuInfo,
+                      llvm::raw_ostream *OS, llvm::DWARFDie die,
+                      FunctionInfo &func);
----------------
`&OS`


================
Comment at: include/llvm/DebugInfo/GSYM/DwarfTransformer.h:73
+
+  llvm::Optional<llvm::object::OwningBinary<llvm::object::ObjectFile>>
+  getObjectFile(const std::string &filename);
----------------
You can get rid of all the `llvm::` (in this line, as well as in the rest of the file)


================
Comment at: include/llvm/DebugInfo/GSYM/DwarfTransformer.h:74
+  llvm::Optional<llvm::object::OwningBinary<llvm::object::ObjectFile>>
+  getObjectFile(const std::string &filename);
+
----------------
I think this member function can be marked `const`, and take a `StringRef`


================
Comment at: include/llvm/DebugInfo/GSYM/DwarfTransformer.h:76
+
+  void initDataFromObj(llvm::object::ObjectFile &obj);
+
----------------
The parameter here can be a `const&`


================
Comment at: include/llvm/DebugInfo/GSYM/FileEntry.h:24
+
+  FileEntry(uint32_t D = 0, uint32_t B = 0) : Dir(D), Base(B) {}
+
----------------
I think it  would be better to just remove this constructor, and use inline member initializers.  e.g.

```
uint32_t Dir = 0;
uint32_t Base = 0;
```

People can still use aggregate initializtion, e.g. `FileEntry FE{1, 2};`


================
Comment at: include/llvm/DebugInfo/GSYM/FileEntry.h:36-43
+namespace std {
+// implement std::hash so that FileEntry can be used as key in
+// unordered containers
+template <> struct hash<llvm::gsym::FileEntry> {
+  size_t operator()(const llvm::gsym::FileEntry &x) const {
+    return std::hash<uint32_t>()(x.Dir) ^ std::hash<uint32_t>()(x.Base);
+  }
----------------
I think it would be better to get rid of this and use `llvm::DenseMap`.  For small types like this it will be substantially more efficient.


================
Comment at: include/llvm/DebugInfo/GSYM/FileTableCreator.h:10-11
+
+#ifndef LLVM_DEBUGINFO_FILETABLECREATOR_H
+#define LLVM_DEBUGINFO_FILETABLECREATOR_H
+
----------------
All header include guards should have `GSYM` after `DEBUGINFO`


================
Comment at: include/llvm/DebugInfo/GSYM/FileTableCreator.h:27
+  std::vector<llvm::gsym::FileEntry> FileEntries;
+  std::shared_ptr<llvm::gsym::StringTableCreator> StringTable;
+
----------------
I'm verrrrrry wary of `std::shared_ptr` in general as I think it leads to confusing ownership semantics and ultimately introduces more problems than it solves.  Can we make this a `unique_ptr`?  Is there some reason we can't have some master entity responsible for "owning" the string table?


================
Comment at: include/llvm/DebugInfo/GSYM/FileWriter.h:13-41
+#include <iostream>
+#include <stddef.h>
+#include <stdint.h>
+#include <sys/types.h>
+
+namespace llvm {
+namespace gsym {
----------------
Two questions about this:  

1) Can we use `llvm::raw_fd_ostream` to write to a file instead of `std::ostream`?

2) Is it possible to compute the size of the output file in advance?  Most LLVM things like to compute sizes in advance, then map the output file into memory, then write to it using something like `FileOutputBuffer`.  This will give you much better performance, and if you're able to compute the size up front and use this approach, this entire class could go away as you could use `FileBufferByteStream` along with `BinaryStreamWriter` to give you all of the functions you've written here.


================
Comment at: include/llvm/DebugInfo/GSYM/FunctionInfo.h:32-33
+
+  FunctionInfo(uint64_t a = 0, uint32_t s = 0, uint32_t n = 0)
+      : Addr(a), Size(s), Name(n) {}
+
----------------
Similar to the above, I think you can remove this constructor and just inline initialize the variables to `0`


================
Comment at: include/llvm/DebugInfo/GSYM/GsymCreator.h:32-33
+struct FunctionData {
+  std::shared_ptr<StringTableCreator> StrTab;
+  std::shared_ptr<FileTableCreator> FileTab;
+  FunctionInfo FuncInfo;
----------------
Store by reference instead of by pointer.


================
Comment at: include/llvm/DebugInfo/GSYM/GsymReader.h:45
+struct Header {
+  uint32_t Magic;
+  uint16_t Version;
----------------
I know I've mentioned this before, but I'll put it in writing in case someone else reviewing this has thoughts.  I'm a huge anti-fan of using native endianness in binary formats.

It means that your only two options are to either write fixup code when you load the file and detect non-native endianness, or just give up and die on the file.  Neither of those are compelling.

In my view, big endian lost.  There's no reason not to standardize on little endian.  If someone ever tries to use this on a big endian system, all is not lost for them.  LLVM has proven with its endian-specific integral types that the code to handle this is not complicated.  They can still just map the file in and use it.  Instead of using a struct that uses types like `uint32_t`, however,  they would need to use a struct that uses types like `ulittle32_t`.  The benefit of this though is that it simplifies the code, and you never have to worry about a conformant GSYM implementation not supporting a particular file, because it only has to worry about one code path instead of 2, and it's the code path that 99% of the world is using anyway.


================
Comment at: include/llvm/DebugInfo/GSYM/LineTable.h:30-34
+    DBG_END_SEQUENCE = 0x00,  // End of the line table
+    DBG_SET_FILE = 0x01,      // Set LineTableRow.file_idx, don't push a row
+    DBG_ADVANCE_PC = 0x02,    // Increment LineTableRow.address, and push a row
+    DBG_ADVANCE_LINE = 0x03,  // Set LineTableRow.file_line, don't push a row
+    DBG_FIRST_SPECIAL = 0x04, // All special opcodes push a row
----------------
Naming convention for enum is either to use a global enum like

```
enum LineTableOpCode {
  LTOC_EndSequence,
  LTOC_SetFile,
  ...
};
```

or to use an enum class without the abbreviation prefix.

```
enum class LineTableOpCode {
  EndSequence,
  SetFile,
  // etc.
}
```


================
Comment at: include/llvm/DebugInfo/GSYM/LineTable.h:42
+  void parse(uint64_t base_addr,
+             std::function<bool(const LineEntry &row)> const &row_callback);
+
----------------
This is confusing me.  Why is it a const reference to a `std::function`.  Regardless though, consider `llvm::unique_function` again.


================
Comment at: include/llvm/DebugInfo/GSYM/LineTable.h:47
+
+  static bool write(FileWriter &out, const FunctionInfo &func_info);
+
----------------
Can you move this function to the cpp file, and mark it static and hide it in the implementation?


================
Comment at: include/llvm/DebugInfo/GSYM/LineTable.h:49
+
+  void parseAllEntries(std::vector<LineEntry> &line_table, uint64_t base_addr);
+  LineEntry lookup(uint64_t base_addr, uint64_t addr);
----------------
Can you make this function `const`, delete the first parameter, and have it return a `std::vector<LineEntry>`.  It's not actually inefficient to return large objects on the stack, due to C++ RVO / NRVO and copy elision.


================
Comment at: include/llvm/DebugInfo/GSYM/LookupResult.h:25-27
+  const char *Name;
+  const char *Dir;
+  const char *Base;
----------------
Can all of these be `StringRef` instead of `const char*`


================
Comment at: include/llvm/DebugInfo/GSYM/LookupResult.h:29-31
+  SourceLocation(const char *N = nullptr, const char *D = nullptr,
+                 const char *B = nullptr, uint32_t L = 0)
+      : Name(N), Dir(D), Base(B), Line(L) {}
----------------
Remove constructor, initialize members inline.


================
Comment at: include/llvm/DebugInfo/GSYM/LookupResult.h:37
+  std::vector<SourceLocation> Locations;
+  std::string getSourceFile(uint32_t idx) const {
+    std::string fullpath;
----------------
Can you move the implementation of these member functions to the cpp file?


================
Comment at: include/llvm/DebugInfo/GSYM/LookupResult.h:41-45
+        fullpath = Locations[idx].Dir;
+        if (Locations[idx].Base) {
+          fullpath.append(1, '/');
+          fullpath.append(Locations[idx].Base);
+        }
----------------
It looks like this could be using `llvm::sys::path::join`


================
Comment at: include/llvm/DebugInfo/GSYM/StringTable.h:39-41
+    std::string s(str);
+    s.append(1, 0); // Add a null terminator
+    auto pos = Data.getData().find(StringRef(s.data(), s.size()));
----------------
Can this function take a `StringRef` and do a `find` directly on the `StringRef` instead of making a copy?


================
Comment at: include/llvm/DebugInfo/GSYM/StringTableCreator.h:22
+class StringTableCreator {
+  std::unordered_map<std::string, uint32_t> Strings;
+  // Strings contains the backing string
----------------
`llvm::StringMap`


================
Comment at: include/llvm/DebugInfo/GSYM/StringTableCreator.h:24
+  // Strings contains the backing string
+  std::unordered_map<uint32_t, const char *> OffsetToString;
+  std::vector<const char *> OrderedStrings;
----------------
`DenseMap`


================
Comment at: include/llvm/DebugInfo/GSYM/StringTableCreator.h:25
+  std::unordered_map<uint32_t, const char *> OffsetToString;
+  std::vector<const char *> OrderedStrings;
+  uint32_t NextOffset;
----------------
`Vector<StringRef>


================
Comment at: include/llvm/DebugInfo/GSYM/StringTableCreator.h:31
+
+  uint32_t insert(std::string s) {
+    auto pos = Strings.find(s);
----------------
`insert(StringRef)


================
Comment at: include/llvm/DebugInfo/GSYM/StringTableCreator.h:32-39
+    auto pos = Strings.find(s);
+    if (pos != Strings.end())
+      return pos->second;
+    uint32_t offset = NextOffset;
+    NextOffset += s.size() + 1;
+    auto it = Strings.emplace(s, offset).first;
+    OffsetToString[offset] = it->first.c_str();
----------------
When you do a `find` followed by an `emplace` it can do up to 2 lookups.  Instead, just do a `try_emplace`, and use the return value to determine whether the insert happened.  It won't overwrite if it was already there.


Repository:
  rL LLVM

https://reviews.llvm.org/D53379





More information about the llvm-commits mailing list