[PATCH] D53051: [llvm-tapi] initial commit, supports ELF text stubs

Jake Ehrlich via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Oct 29 14:54:50 PDT 2018


jakehehrlich added a comment.

Looks pretty close. My major concerns are with the currently chosen interface but I think we can shake that out later.



================
Comment at: llvm/include/llvm/TAPI/ELFStubFile.h:32
+enum ELFSymbolType {
+  no_type =     ELF::STT_NOTYPE,
+  object =      ELF::STT_OBJECT,
----------------
Check out the naming convention: https://llvm.org/docs/CodingStandards.html. These should be capitalized. You should be allowed to avoid the prefix by using a scoped enum as well.


================
Comment at: llvm/include/llvm/TAPI/ELFStubFile.h:38
+  // Type information is a nibble, so 16 is safely out of range.
+  unknown =   16,
+};
----------------
What is this for?


================
Comment at: llvm/include/llvm/TAPI/ELFStubFile.h:43
+struct ELFSymbol {
+  std::string   st_name;
+  unsigned int  st_size;
----------------
member names should be capitalized with no underscores. 


================
Comment at: llvm/include/llvm/TAPI/ELFStubFile.h:47
+  bool          st_undefined;
+  std::string   st_warning;
+};
----------------
Might be worth making this optional.


================
Comment at: llvm/include/llvm/TAPI/ELFStubFile.h:52
+struct ELFStub {
+  std::map<std::string, ELFSymbol> symbols;
+  ELFArch arch;
----------------
In general you want to use StringMap<T> instead of std::map<std::string, T> but this is an exception as we will soon see. Here it's nice if the strings are ordered however so this might make sense (but stay tooned because it doesn't). We can rule out StringMap based on ordering alone. One issue with std::map<std::string, ElfSymbol> is that the symbol name is duplicated. An obvious next thought would be to use std::map<StringRef, ElfSymbol>. This forces us to think about the ownership of the string. Presumably the only place you're storing the strings are in the symbols. This is tricky to get to work but wold be nice because you'd know that as long as the pair is in the map then you can store the StringRef as the key and avoid storing the string twice. As I mentioned this is tricky to get to work right. If you use `emplace` and you move the ELFSymbol into the map then it *should* all work out because st_name should never be deallocated. Strictly speaking though there is no such assurance however. When you move a string it might reallocate and then you're StringRef will be invalid. It turns out if you look at the programmers manual it gives a recommendation: http://llvm.org/docs/ProgrammersManual.html#map-like-containers-std-map-densemap-etc. The first recommendation is to use an std::vector! In your case you can use the symbol name as the key and then your ownership problem is solved. This also tends to be much faster than std::map or StringMap for these use cases.


================
Comment at: llvm/include/llvm/TAPI/ELFStubFile.h:61
+public:
+  ELFStubFile(FileType type) : File(type) {}
+  ELFStubFile(FileType type, ELFStub &stub) : File(type) {
----------------
best to make these sorts of constructors explict. You only really want implicit constructors for things like copy constructors, move constructors, or silent conversions (e.g. std::vector to ArrayRef).


================
Comment at: llvm/include/llvm/TAPI/ELFStubFile.h:114
+private:
+  std::vector<ELFSymbol> _symbols;
+  ELFArch _arch;
----------------
As talked about offline, this won't be in this patch but you don't want to follow the Google coding standard here. You want to follow LLVM's which states that these should be PascalCase functions and methods should be camelCase as you already have them. As another thought, if anything like this exists in the future, we should just store an ELFStub rather than duplicating the information.


================
Comment at: llvm/include/llvm/TAPI/File.h:1
+//===- File.h ---------------------------------------------------*- C++ -*-===//
+//
----------------
Explicitly stating that I'm ignoring this because it won't go in this patch.


================
Comment at: llvm/include/llvm/TAPI/File.h:41-42
+
+  void setFilePath(StringRef path) { _path = path; }
+  StringRef getFilePath() const { return _path; }
+  StringRef getFileName() const { return llvm::sys::path::filename(_path); }
----------------
If you have both a getter and a setter and there's no good reason to not just use the member directly, just make it a public member.


================
Comment at: llvm/include/llvm/TAPI/TBEHandlerV1.h:23
+
+LLVM_YAML_STRONG_TYPEDEF(uint16_t, ELFArchMapper);
+
----------------
This probably belongs in the file with the other YAML stuff in it.


================
Comment at: llvm/lib/TAPI/TBEHandlerV1.cpp:28
+  static void enumeration(IO &io, ELFSymbolType &symbolType) {
+    io.enumCase(symbolType, "NO-TYPE", ELFSymbolType::no_type);
+    io.enumCase(symbolType, "FUNCTION", ELFSymbolType::function);
----------------
I'd prefer PascalCase for all keys and I'd prefer they match the ELF equivalent name as closely as possible. So NoType, Func (not Function), Object, etc.. TLS is actually fine the way it is. Please apply that to all keys. PascalCase is just the standard in llvm for YAML keys. The request that we be close to ELF names is just a personal request from me.


================
Comment at: llvm/lib/TAPI/TBEHandlerV1.cpp:36
+
+template<> struct ScalarTraits<ELFArchMapper> {
+  static void output(const ELFArchMapper &value, void*,
----------------
Can you use this? https://github.com/llvm-mirror/llvm/blob/master/lib/ObjectYAML/ELFYAML.cpp#L608


================
Comment at: llvm/lib/TAPI/TBEHandlerV1.cpp:37
+template<> struct ScalarTraits<ELFArchMapper> {
+  static void output(const ELFArchMapper &value, void*,
+                     llvm::raw_ostream &out) {
----------------
Same mapping comment and naming comments requested as below.


================
Comment at: llvm/lib/TAPI/TBEHandlerV1.cpp:47
+
+  static StringRef input(StringRef scalar, void*, ELFArchMapper &value) {
+    if (scalar.equals("AARCH64"))
----------------
You should probably implement this with a map defined as a static constant inside this function.


================
Comment at: llvm/lib/TAPI/TBEHandlerV1.cpp:47
+
+  static StringRef input(StringRef scalar, void*, ELFArchMapper &value) {
+    if (scalar.equals("AARCH64"))
----------------
jakehehrlich wrote:
> You should probably implement this with a map defined as a static constant inside this function.
Why does input return a StringRef? What is it used for? Please leave a comment explaining.


================
Comment at: llvm/lib/TAPI/TBEHandlerV1.cpp:48
+  static StringRef input(StringRef scalar, void*, ELFArchMapper &value) {
+    if (scalar.equals("AARCH64"))
+      value = ELF::EM_AARCH64;
----------------
"AArch64" would be the norm.


================
Comment at: llvm/lib/TAPI/TBEHandlerV1.cpp:50
+      value = ELF::EM_AARCH64;
+    else if (scalar.equals("X86_64"))
+      value = ELF::EM_X86_64;
----------------
"x86_64" would be the norm.


================
Comment at: llvm/lib/TAPI/TBEHandlerV1.cpp:57
+
+  static QuotingType mustQuote(StringRef) { return QuotingType::None; }
+};
----------------
Can you leave a brief comment explaining what this does?


================
Comment at: llvm/lib/TAPI/TBEHandlerV1.cpp:63
+    io.mapRequired("TYPE", symbol.st_type);
+    io.mapOptional("SIZE", symbol.st_size, 0u);
+    io.mapOptional("UNDEFINED", symbol.st_undefined, false);
----------------
I'd branch on the type here and make this required for object/tls, not possible for func, and optional for notype.


================
Comment at: llvm/lib/TAPI/TBEHandlerV1.cpp:68
+    // clear size for functions
+    if (symbol.st_type == ELFSymbolType::function)
+      symbol.st_size = 0;
----------------
This is a kludge, remove it.


================
Comment at: llvm/lib/TAPI/TBEHandlerV1.cpp:79-83
+    if (map.find(key) == map.end()) {
+      ELFSymbol elem;
+      elem.st_name = key;
+      map.insert(std::make_pair(key, elem));
+    }
----------------
Calling `map[key]` creates a reference for you so that this isn't an error. As pointed out before though std::map probably isn't the best representation. If you need to for the purposes of reading you can construct an map in order to have this be called and then traverse it to construct the vector.


================
Comment at: llvm/lib/TAPI/TBEHandlerV1.cpp:93
+
+template <> struct MappingTraits<ELFStub> {
+  static void mapping(IO &io, ELFStub &stub) {
----------------
How would multiple version be maintained here? Isn't there only one MappingTraits then?


================
Comment at: llvm/lib/TAPI/TBEHandlerV1.cpp:97
+    io.mapRequired("SONAME", stub.so_name);
+    io.mapRequired("ARCHITECTURE", (ELFArchMapper&) stub.arch);
+    if (io.outputting() && stub.symbols.size() > 0)
----------------
I think I'd prefer "Arch"


================
Comment at: llvm/lib/TAPI/TBEHandlerV1.cpp:98-101
+    if (io.outputting() && stub.symbols.size() > 0)
+      io.mapRequired("SYMBOLS", stub.symbols);
+    else if (!io.outputting())
+      io.mapOptional("SYMBOLS", stub.symbols);
----------------
Why can't you just always use mapOptional?


================
Comment at: llvm/lib/TAPI/TBEHandlerV1.cpp:113
+  StringRef str = memBufferRef.getBuffer().trim();
+  // startswith() and endswith() aren't options as they break testing via lit.
+  if (str.find("--- !tapi-tbe-v1\n") != StringLiteral::npos
----------------
This comment indicates this isn't the right way to do this. In general it is perfectly fine to attempt to read in a file and then handle the error without checking if it's going to work first.


================
Comment at: llvm/lib/TAPI/TBEHandlerV1.cpp:136
+
+std::vector<std::string> TBEHandlerV1::getSupportedFileExtensions() const {
+  return {".tbe"};
----------------
I'm not sure if they have this in their patch or how it's used exactly, but it should only be used as a hint (perhaps to choose which format parsing should attempt first) and should never be required. I'm inclined to throw it out all together especially since binary formats should quickly error out if the MemoryBuffer doesn't contain the correct magic at the start.


================
Comment at: llvm/lib/TAPI/TBEHandlerV1.cpp:140
+
+std::unique_ptr<File> TBEHandlerV1::readFile(
+      MemoryBufferRef memBufferRef) {
----------------
As discussed offline, just have a function that returns a stub and test it for now.


================
Comment at: llvm/lib/TAPI/TBEHandlerV1.cpp:141
+std::unique_ptr<File> TBEHandlerV1::readFile(
+      MemoryBufferRef memBufferRef) {
+  if (!canRead(memBufferRef, FileType::All))
----------------
I'm not clear on what the correct input should be here but I'd go ahead and make it StringRef for now. It's supported just the same but is easier to test. I imagine it will be just as easy to use StringRef as MemoryBufferRef in the future (but I could be wrong).


================
Comment at: llvm/lib/TAPI/TBEHandlerV1.cpp:152
+
+Error TBEHandlerV1::writeFile(raw_ostream &os, const File *file) {
+  auto *inFile = dyn_cast_or_null<ELFStubFile>(file);
----------------
Looks like there's no good way to avoid using a raw_ostream but I'll leave you with this bit of advice to make sure the problem doesn't leak to the rest of the interfaces. When writing ELF files you'll want to use something like a Filebuffer and not an raw_ostream so whatever abstraction you land on for writing out the different formats needs to be generic to those two options. I'm not aware of a good abstraction to account for this at the moment but I'll think about it.


================
Comment at: llvm/tools/llvm-tapi/llvm-tapi.cpp:1
+// a simple binary to test tapi, will evolve into the tapi command line utility.
+#include <iostream>
----------------
Licence in whatever patch this goes into.


Repository:
  rL LLVM

https://reviews.llvm.org/D53051





More information about the llvm-commits mailing list