[PATCH] D61767: [llvm-elfabi] Emit ELF header and string table section

Thu Nov 12 01:52:31 PST 2020

jhenderson requested changes to this revision.
jhenderson added a comment.
This revision now requires changes to proceed.

Nothing grievously wrong here, but lots of small things that need improving. Requesting changes mostly to clear the "Accepted" state.

================
Comment at: llvm/lib/InterfaceStub/ELFObjHandler.cpp:52
+template <class ELFT>
+void initELFHeader(typename ELFT::Ehdr &ElfHeader, uint16_t Machine) {
+  memset(&ElfHeader, 0, sizeof(ElfHeader));
----------------
It's been a while since I looked at this code, but I wonder whether this is essentially a duplication of code elsewhere in LLVM. Certainly I'd expect tools like LLD, llvm-objcopy and llvm-mc to have functionality that emits an ELF Header of some description. Do we really need to repeat it here?

Also, I wonder if it would be a little more flexible/traditional to return a locally constructed ElfHeader rather than take it as an input?

================
Comment at: llvm/lib/InterfaceStub/ELFObjHandler.cpp:70-71
+  ElfHeader.e_version = EV_CURRENT;
+  ElfHeader.e_entry = 0;
+  ElfHeader.e_flags = 0;
+  ElfHeader.e_ehsize = sizeof(typename ELFT::Ehdr);
----------------
Dead code? You use `memset` to zero earlier. Same goes for the EI_ABIVERSION setting earlier.

================
Comment at: llvm/lib/InterfaceStub/ELFObjHandler.cpp:118
+    ShStrTab.Align = 1;
+    StrTab.Name = ".dynstr";
+    StrTab.Align = 1;
----------------
I wonder if it would be less confusing if `StrTab` were to be called `Dynstr`? `.strtab` is traditionally the name of the static symbol string table.

================
Comment at: llvm/lib/InterfaceStub/ELFObjHandler.cpp:136-149
+    const OutputSection<ELFT> *Prev = nullptr;
+    uint64_t StartOffset = sizeof(Elf_Ehdr);
+    for (OutputSection<ELFT> *Sec : Sections) {
+      uint64_t Align = Sec->Align;
+      if (Prev == nullptr)
+        Sec->Offset = alignTo(StartOffset, Align);
+      else
----------------
If I follow this code correctly, this whole loop could be dramatically simplified, as suggested.

This appears to be setting the initial address to the end of the ELF header. It's certainly not the case that the ELF header has to always be assigned memory - if it doesn't appear in a program header in a linker script, it won't be for example, so the initial address would start from zero. I note also that the program doesn't appear to be emitting program headers at all, so the use of addresses seems a little confusing to me anyway.

================
Comment at: llvm/lib/InterfaceStub/ELFObjHandler.cpp:150
+    }
+    // Write Shdr of String tables
+    fillStrTabShdr(StrTab, SHF_ALLOC);
----------------

================
Comment at: llvm/lib/InterfaceStub/ELFObjHandler.cpp:153
+    fillStrTabShdr(ShStrTab);
+    // Initilize ELFheader
+    initELFHeader<ELFT>(ElfHeader, Stub.Arch);
----------------
I suggest using a generic term rather than the variable name to avoid rotting comments should variables get renamed.

================
Comment at: llvm/lib/InterfaceStub/ELFObjHandler.cpp:165
+
+  void write(uint8_t *Data) const {
+    write(Data, ElfHeader);
----------------
I wonder if it would be easier for this function (with the help of its child functions) to update the offset as it goes along, rather than functions needing to precalculate where the thing should be written:
```
  void write(uint8_t *Data) const {
    write(Data, ElfHeader);
    Data += ShStrTab.Shdr.sh_offset;
    ShStrTab.Content.write(Data);
    Data += StrTab.Shdr.sh_offset;
    StrTab.Content.write(Data);
    Data = ElfHeader.e_shoff;
    writeShdr(Data, ShStrTab);
    writeShdr(Data, StrTab);
  }

  template <class T> static void write(uint8_t *&Data, const T &Value) {
    *reinterpret_cast<T *>(Data) = Value;
    Data += sizeof(Value);
  }

  void writeShdr(uint8_t *Data, const OutputSection<ELFT> &Sec) const {
    write(Data, Sec.Shdr);
  }
```
At that point, some of the helper methods might become completely unnecessary.

================
Comment at: llvm/lib/InterfaceStub/ELFObjHandler.cpp:167
+    write(Data, ElfHeader);
+    ShStrTab.Content.write(Data + ShStrTab.Shdr.sh_offset);
+    StrTab.Content.write(Data + StrTab.Shdr.sh_offset);
----------------
Here and below, would a loop over a list of Sections make more sense going forwards, like you already do above? I don't know if there are expected to be more sections added in the future, but if there are, such a loop would be more robust than having to remember to add everything in multiple places.

================
Comment at: llvm/lib/InterfaceStub/ELFObjHandler.cpp:170
+    writeShdr(Data, ShStrTab);
+    writeShdr(Data, StrTab);
+  }
----------------
It might not be important, but it seems more natural to have the SHF_ALLOC seection headers before the non SHF_ALLOC stuff in the section header table (same goes for the sections themselves).

================
Comment at: llvm/lib/InterfaceStub/ELFObjHandler.cpp:179
+  template <class T> static void write(uint8_t *Data, const T &Value) {
+    *reinterpret_cast<T *>(Data) = Value;
+  }
----------------
This line makes me very nervous that this code will fail miserably where the host endianness doesn't match the target endianness with non-single byte `T` types.

================
Comment at: llvm/lib/InterfaceStub/ELFObjHandler.cpp:190
+    StrTab.Shdr.sh_size = StrTab.Size;
+    StrTab.Shdr.sh_name = this->ShStrTab.Content.getOffset(StrTab.Name);
+    StrTab.Shdr.sh_addralign = StrTab.Align;
----------------

================
Comment at: llvm/lib/InterfaceStub/ELFObjHandler.cpp:384

-  return createStringError(errc::not_supported, "Unsupported binary format");
 }
----------------
I'm concerned that this has been changed to `llvm_unreachable`. Is this function only able to be called with ELF binaries? What happens if a user passes in e.g. an archive or COFF file? `createBinary` can return more than just ELF types, which would lead to an assertion here. This case will want testing too. Also, it seem unrelated to this patch?

================
Comment at: llvm/lib/InterfaceStub/ELFObjHandler.cpp:589
+  return createStringError(errc::invalid_argument,
+                           "Invalid binary output target");
 }
----------------
The style guide says to use lower-case strings where possible for error messages.

I think this case SHOULD be an llvm_unreachable? It's not possible for a user to hit this code as far as I can tell (alternatively, write a test for it...)

================
Comment at: llvm/lib/InterfaceStub/ELFObjHandler.cpp:148
+      if (Sec->NoBits)
+        Align = 1;
+      if (Prev == nullptr)
----------------
haowei wrote:
> grimar wrote:
> > Why nobits sections are special?
> > 
> > Also I don't understand:
> > 1) There are just 2 sections currently: `ShStrTab` and `StrTab`. Why do you need the code for `NoBits`?
> > 2) Why can't you set `Align = 1` from start for them instead of having `Sec->Align ? Sec->Align : 1;` logic?
> Most part of this change was authored by my previous co-workers. I don't know the specific reason why NoBits are handled here, but I could guess the code for NoBits are reserved in case the stub SO generated from elfabi would need any sections that has NoBits attribute. I agree it is not useful now as the following change do not use NoBits sections as well.
> 
> For align, the sh_addralign of .shstrtab and .dynstr was set to to 0 in this change. If I recall correctly both 0 and 1 are acceptable values for sh_addralign if the section does not have alignment constraints. Since alignTo function require "Align" to be non-zero, I guess that is why `Sec->Align ? Sec->Align : 1` exists. I have change the sh_addralign to 1 for the string tables to avoid it.
> 
> 
> 
I've forgotten any memory I have of the long-term goal with this tool, so this might be irrelevant, but is it possible that a user in the future will set an alignment of 0 explicitly and therefore this will fail?

================
Comment at: llvm/lib/InterfaceStub/ELFObjHandler.cpp:609
+
+  return createStringError(errc::invalid_argument,
+                           "Invalid binary output target");
----------------
grimar wrote:
> this should be `llvm_unreachable` I think?
I think the wrong place was changed?

================
Comment at: llvm/test/tools/llvm-elfabi/binary-write-sheaders.test:1
+# RUN: llvm-elfabi %s --output-target=elf64-little %t
+# RUN: llvm-readobj -h %t | FileCheck %s --check-prefix=ELFHEADER
----------------
Please add a comment to the top of this test stating what this test is testing. Same goes for all tests.

This test is only testing one of the 4 ELF formats (64-bit LE). You should probably have the other three too?

================
Comment at: llvm/test/tools/llvm-elfabi/binary-write-sheaders.test:7
+
+
+--- !tapi-tbe
----------------
Nit: remove extra blank line.

================
Comment at: llvm/test/tools/llvm-elfabi/fail-file-write.test:15
+
+
+# ERR: Permission denied when trying to open
----------------
Nit: delete extra blank line.

================
Comment at: llvm/test/tools/llvm-elfabi/fail-file-write.test:16
+
+# ERR: Permission denied when trying to open
----------------
Please check the full error here, rather than a partial one. Additionally, I would be suspicious that the error message will be different depending on whether the host OS is Linux or Windows. Please could you check? If you don't have access to both systems, let me know, and I can test the one you can't.

================
Comment at: llvm/test/tools/llvm-elfabi/invalid-bin-target.test:1
+# RUN: not llvm-elfabi %s --output-target=nope %t 2>&1 | FileCheck %s
+
----------------
I'd rename this test to `invalid-output-target` or even `output-target-invalid` to match the option name.

================
Comment at: llvm/test/tools/llvm-elfabi/missing-bin-target.test:1
+# RUN: not llvm-elfabi %s %t 2>&1 | FileCheck %s
+
----------------
It would probably be easier to merge this and the above test together, so that you have all tests for `--output-target` in one place.

================
Comment at: llvm/test/tools/llvm-elfabi/missing-bin-target.test:10
+
+# CHECK: No binary output target specified.
----------------
This probably wants an additional bit of message. Something like "error: No binary..."

================
Comment at: llvm/test/tools/llvm-elfabi/write-elf32be-ehdr.test:1
+# RUN: llvm-elfabi %s --output-target=elf32-big %t
+# RUN: llvm-readobj --file-headers %t | FileCheck %s --match-full-lines
----------------
It's not clear to me why this test is separate to the `binary-write-sheaders` test? Can they be merged?

================
Comment at: llvm/test/tools/llvm-elfabi/write-elf32le-ehdr.test:1
+# RUN: llvm-elfabi %s --output-target=elf32-little %t
+# RUN: llvm-readobj --file-headers %t | FileCheck %s --match-full-lines
----------------
If it doesn't have it already, I'd recommend adding a --docnum option to llvm-elfabi, like yaml2obj, so that you can have multiple test variants within the same file (in a separate prerequisite patch). In this case, I think it would make a lot of sense for it to be in the same test as the big endian test. Same with the 64-bit variants.

================
Comment at: llvm/tools/llvm-elfabi/llvm-elfabi.cpp:138-140
+  // Change SoName before emitting stubs.
+  if (SOName.getNumOccurrences() == 1)
+    TargetStub->SoName = SOName;
----------------
This seems unrelated to your other changes in this patch? I certainly don't see any relevant test changes.

================
Comment at: llvm/tools/llvm-elfabi/llvm-elfabi.cpp:154
+    if (BinaryOutputTarget.getNumOccurrences() == 0) {
+      WithColor::error() << "No binary output target specified.\n";
+      exit(1);
----------------
I can't remember exactly the invocation (and am too lazy to check it right now), but `WithColor` has a default error handler function that can be passed an LLVM error I believe, so use that instead, I recommend. I'd probably also wrap it all in a local function that does the `exit(1)` too to reduce duplication.

Also, LLVM style guide says errors have no full stops and no leading capitalization.

================
Comment at: llvm/tools/llvm-elfabi/llvm-elfabi.cpp:160
+    if (BinaryWriteError) {
+      WithColor::error() << BinaryWriteError << "\n";
+      exit(1);
----------------
Same comment here re. default error handler function. You might want to change the `TBEWriteError` and other error bits similarly, in a separate patch.

================
Comment at: llvm/tools/llvm-elfabi/llvm-elfabi.cpp:48
+           cl::desc("Manually set the DT_SONAME entry of any emitted files"),
+           cl::value_desc("name"));
+cl::opt<ELFTarget> BinaryOutputTarget(
----------------
grimar wrote:
> haowei wrote:
> > grimar wrote:
> > > All above are unrelated formatting changes I think.
> > Yes. I just run clang-format on entire file. Can I keep them?
> It is better to avoid unnecessarily changes in patches. These changes are unrealated and the right way
> to do this would be to format the entry file first and commit as NFC, that should not need a review in this case.
As llvm-elfabi is a new tool, it would be better to do the reformatting early as @grimar suggests, rather than let things linger. That will make everybody's lives easier as you'll just be able to reformat the whole file when adding new code (and only the new bits will change).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61767/new/

https://reviews.llvm.org/D61767