[PATCH] D61767: [llvm-elfabi] Emit ELF header and string table section

Fri Nov 13 00:50:07 PST 2020

haowei added inline comments.

================
Comment at: llvm/lib/InterfaceStub/ELFObjHandler.cpp:52
+template <class ELFT>
+void initELFHeader(typename ELFT::Ehdr &ElfHeader, uint16_t Machine) {
+  memset(&ElfHeader, 0, sizeof(ElfHeader));
----------------
jhenderson wrote:
> It's been a while since I looked at this code, but I wonder whether this is essentially a duplication of code elsewhere in LLVM. Certainly I'd expect tools like LLD, llvm-objcopy and llvm-mc to have functionality that emits an ELF Header of some description. Do we really need to repeat it here?
> 
> Also, I wonder if it would be a little more flexible/traditional to return a locally constructed ElfHeader rather than take it as an input?
I checked writeEhdr() in obj-copy but it is not flexible enough to be used here. I agree it would be great to have a common function to write Ehdr into memory so it can be reused by these tools. But I feel it is out of scope of this change. I can draft a change to address this problem if you feel it is necessary to do so.

================
Comment at: llvm/lib/InterfaceStub/ELFObjHandler.cpp:118
+    ShStrTab.Align = 1;
+    StrTab.Name = ".dynstr";
+    StrTab.Align = 1;
----------------
jhenderson wrote:
> I wonder if it would be less confusing if `StrTab` were to be called `Dynstr`? `.strtab` is traditionally the name of the static symbol string table.
StrTab is renamed to DynStr.

================
Comment at: llvm/lib/InterfaceStub/ELFObjHandler.cpp:165
+
+  void write(uint8_t *Data) const {
+    write(Data, ElfHeader);
----------------
jhenderson wrote:
> I wonder if it would be easier for this function (with the help of its child functions) to update the offset as it goes along, rather than functions needing to precalculate where the thing should be written:
> ```
>   void write(uint8_t *Data) const {
>     write(Data, ElfHeader);
>     Data += ShStrTab.Shdr.sh_offset;
>     ShStrTab.Content.write(Data);
>     Data += StrTab.Shdr.sh_offset;
>     StrTab.Content.write(Data);
>     Data = ElfHeader.e_shoff;
>     writeShdr(Data, ShStrTab);
>     writeShdr(Data, StrTab);
>   }
> 
>   template <class T> static void write(uint8_t *&Data, const T &Value) {
>     *reinterpret_cast<T *>(Data) = Value;
>     Data += sizeof(Value);
>   }
> 
>   void writeShdr(uint8_t *Data, const OutputSection<ELFT> &Sec) const {
>     write(Data, Sec.Shdr);
>   }
> ```
> At that point, some of the helper methods might become completely unnecessary.
I prefer not to do that. Using write function to determine the offset would look very messy on D89432. 

================
Comment at: llvm/lib/InterfaceStub/ELFObjHandler.cpp:167
+    write(Data, ElfHeader);
+    ShStrTab.Content.write(Data + ShStrTab.Shdr.sh_offset);
+    StrTab.Content.write(Data + StrTab.Shdr.sh_offset);
----------------
jhenderson wrote:
> Here and below, would a loop over a list of Sections make more sense going forwards, like you already do above? I don't know if there are expected to be more sections added in the future, but if there are, such a loop would be more robust than having to remember to add everything in multiple places.
Using loop would make more sense but it seems to be infeasible in this case. The "Section.Content.write" function is responsible for writing data to the MemoryBuffer but this function is not a virtual function derived by different classes (there are SymTab and Dynamic sections builder in D89432) so it would require some data structure changes to make it happen. Or if there is anything like "instanceof" that I can check and cast the types safely at run time, that would also work.

 As shown in D89432, we only need to write 4 sections in a stub elf file. So it is not very cost efficient to invent some new classes just to make the loop happen. If you have better solutions please let me know.

================
Comment at: llvm/lib/InterfaceStub/ELFObjHandler.cpp:170
+    writeShdr(Data, ShStrTab);
+    writeShdr(Data, StrTab);
+  }
----------------
jhenderson wrote:
> It might not be important, but it seems more natural to have the SHF_ALLOC seection headers before the non SHF_ALLOC stuff in the section header table (same goes for the sections themselves).
I changed the order in the new diff

================
Comment at: llvm/lib/InterfaceStub/ELFObjHandler.cpp:179
+  template <class T> static void write(uint8_t *Data, const T &Value) {
+    *reinterpret_cast<T *>(Data) = Value;
+  }
----------------
grimar wrote:
> jhenderson wrote:
> > grimar wrote:
> > > jhenderson wrote:
> > > > This line makes me very nervous that this code will fail miserably where the host endianness doesn't match the target endianness with non-single byte `T` types.
> > > It is only used for writing `Elf_Shdr`, `Elf_Ehdr` etc types, what I think should work fine on any hosts.
> > > Is your concern that somebody might try to use `write` for regular types?
> > We're probably talking past each other, or I might be forgetting something, but the fields within `Elf_Shdr`, `Elf_Ehdr` etc will be in little/big endian order right depending on the host, right? This code here is essentially just doing a byte-wise copy, by my understanding.
> My understanding is the following:
> 
> E.g. `Elf_Shdr consists of `Elf_Word`/`Elf_Addr`/`Elf_Off` fields,
> which are of `support::detail::packed_endian_specific_integral` type:
> 
> 
> ```
> using Elf_Shdr_Base<ELFT>::sh_size;
>   
> template <endianness TargetEndianness>
> struct Elf_Shdr_Base<ELFType<TargetEndianness, false>> {
>   LLVM_ELF_IMPORT_TYPES(TargetEndianness, false)
>   Elf_Word sh_name;      // Section name (index into string table)
>   Elf_Word sh_type;      // Section type (SHT_*)
>   Elf_Word sh_flags;     // Section flags (SHF_*)
>   Elf_Addr sh_addr;      // Address where section is to be loaded
>   Elf_Off sh_offset;     // File offset of section data, in bytes
>   Elf_Word sh_size;      // Size of section, in bytes
>   Elf_Word sh_link;      // Section type-specific header table index link
>   Elf_Word sh_info;      // Section type-specific extra information
>   Elf_Word sh_addralign; // Section address alignment
>   Elf_Word sh_entsize;   // Size of records contained within the section
> };
> 
> #define LLVM_ELF_IMPORT_TYPES_ELFT(ELFT)                                       \
>   using Elf_Addr = typename ELFT::Addr;                                        \
>   using Elf_Off = typename ELFT::Off;                                          \
>   using Elf_Half = typename ELFT::Half;                                        \
>   using Elf_Word = typename ELFT::Word;                                        \
>   using Elf_Sword = typename ELFT::Sword;                                      \
>   using Elf_Xword = typename ELFT::Xword;                                      \
>   using Elf_Sxword = typename ELFT::Sxword;
> 
>   using Half = packed<uint16_t>;
>   using Word = packed<uint32_t>;
>   using Sword = packed<int32_t>;
>   using Xword = packed<uint64_t>;
>   using Sxword = packed<int64_t>;
>   using Addr = packed<uint>;
>   using Off = packed<uint>;
>  
>   template <typename Ty>
>   using packed = support::detail::packed_endian_specific_integral<Ty, E, 1>;
> ```
>  
> This type respects endianess on assign:
> 
> 
> ```
> struct packed_endian_specific_integral {
> ....
>   void operator=(value_type newValue) {
>     endian::write<value_type, endian, alignment>(
>       (void*)Value.buffer, newValue);
>   }
> 
> /// Write a value to memory with a particular endianness.
> template <typename value_type, std::size_t alignment>
> inline void write(void *memory, value_type value, endianness endian) {
>   value = byte_swap<value_type>(value, endian);
>   memcpy(LLVM_ASSUME_ALIGNED(
>              memory, (detail::PickAlignment<value_type, alignment>::value)),
>          &value, sizeof(value_type));
> }
> 
> template <typename value_type>
> inline value_type byte_swap(value_type value, endianness endian) {
>   if ((endian != native) && (endian != system_endianness()))
>     sys::swapByteOrder(value);
>   return value;
> }
> ```
Test does not fail when writing elfheader in be or le. If you have better solution to make it more robust, please let me know.

================
Comment at: llvm/lib/InterfaceStub/ELFObjHandler.cpp:589
+  return createStringError(errc::invalid_argument,
+                           "Invalid binary output target");
 }
----------------
jhenderson wrote:
> The style guide says to use lower-case strings where possible for error messages.
> 
> I think this case SHOULD be an llvm_unreachable? It's not possible for a user to hit this code as far as I can tell (alternatively, write a test for it...)
I changed it to llvm_unreachable.

================
Comment at: llvm/lib/InterfaceStub/ELFObjHandler.cpp:148
+      if (Sec->NoBits)
+        Align = 1;
+      if (Prev == nullptr)
----------------
jhenderson wrote:
> haowei wrote:
> > grimar wrote:
> > > Why nobits sections are special?
> > > 
> > > Also I don't understand:
> > > 1) There are just 2 sections currently: `ShStrTab` and `StrTab`. Why do you need the code for `NoBits`?
> > > 2) Why can't you set `Align = 1` from start for them instead of having `Sec->Align ? Sec->Align : 1;` logic?
> > Most part of this change was authored by my previous co-workers. I don't know the specific reason why NoBits are handled here, but I could guess the code for NoBits are reserved in case the stub SO generated from elfabi would need any sections that has NoBits attribute. I agree it is not useful now as the following change do not use NoBits sections as well.
> > 
> > For align, the sh_addralign of .shstrtab and .dynstr was set to to 0 in this change. If I recall correctly both 0 and 1 are acceptable values for sh_addralign if the section does not have alignment constraints. Since alignTo function require "Align" to be non-zero, I guess that is why `Sec->Align ? Sec->Align : 1` exists. I have change the sh_addralign to 1 for the string tables to avoid it.
> > 
> > 
> > 
> I've forgotten any memory I have of the long-term goal with this tool, so this might be irrelevant, but is it possible that a user in the future will set an alignment of 0 explicitly and therefore this will fail?
The section alignment is set in this builder function so I guess the case you discussed would not happen. I have no preference in using 0 or 1 here. Using 1 seems to make code simpler but less robust.

================
Comment at: llvm/lib/InterfaceStub/ELFObjHandler.cpp:609
+
+  return createStringError(errc::invalid_argument,
+                           "Invalid binary output target");
----------------
jhenderson wrote:
> grimar wrote:
> > this should be `llvm_unreachable` I think?
> I think the wrong place was changed?
I changed the wrong place, sorry about that.

================
Comment at: llvm/test/tools/llvm-elfabi/fail-file-write.test:16
+
+# ERR: Permission denied when trying to open
----------------
jhenderson wrote:
> jhenderson wrote:
> > Please check the full error here, rather than a partial one. Additionally, I would be suspicious that the error message will be different depending on whether the host OS is Linux or Windows. Please could you check? If you don't have access to both systems, let me know, and I can test the one you can't.
> Looking at the pre-merge checks this test is failing on Windows as-is.
I have a windows workstation but it does not have development environment set up for llvm. I will see if I can set it up tomorrow. For now, let me test it on bots first.

================
Comment at: llvm/test/tools/llvm-elfabi/missing-bin-target.test:1
+# RUN: not llvm-elfabi %s %t 2>&1 | FileCheck %s
+
----------------
jhenderson wrote:
> It would probably be easier to merge this and the above test together, so that you have all tests for `--output-target` in one place.
merged in latest diff.

================
Comment at: llvm/test/tools/llvm-elfabi/write-elf32be-ehdr.test:1
+# RUN: llvm-elfabi %s --output-target=elf32-big %t
+# RUN: llvm-readobj --file-headers %t | FileCheck %s --match-full-lines
----------------
jhenderson wrote:
> It's not clear to me why this test is separate to the `binary-write-sheaders` test? Can they be merged?
merged in latest diff.

================
Comment at: llvm/test/tools/llvm-elfabi/write-elf32le-ehdr.test:1
+# RUN: llvm-elfabi %s --output-target=elf32-little %t
+# RUN: llvm-readobj --file-headers %t | FileCheck %s --match-full-lines
----------------
jhenderson wrote:
> If it doesn't have it already, I'd recommend adding a --docnum option to llvm-elfabi, like yaml2obj, so that you can have multiple test variants within the same file (in a separate prerequisite patch). In this case, I think it would make a lot of sense for it to be in the same test as the big endian test. Same with the 64-bit variants.
We intended to merge elfabi tool into elf-ifs tool once the we feel ELFObjHandler is stable. So I prefer not to add additional flag(feature) to llvm-elfabi at this point.

================
Comment at: llvm/tools/llvm-elfabi/llvm-elfabi.cpp:138-140
+  // Change SoName before emitting stubs.
+  if (SOName.getNumOccurrences() == 1)
+    TargetStub->SoName = SOName;
----------------
jhenderson wrote:
> This seems unrelated to your other changes in this patch? I certainly don't see any relevant test changes.
writeBinaryStub also use SoName in TargetStub variable. This change moves this logic outside of the EmitTBE branch so the assignment at L140 can be written once but used by both L147 and L159.

================
Comment at: llvm/tools/llvm-elfabi/llvm-elfabi.cpp:154
+    if (BinaryOutputTarget.getNumOccurrences() == 0) {
+      WithColor::error() << "No binary output target specified.\n";
+      exit(1);
----------------
jhenderson wrote:
> I can't remember exactly the invocation (and am too lazy to check it right now), but `WithColor` has a default error handler function that can be passed an LLVM error I believe, so use that instead, I recommend. I'd probably also wrap it all in a local function that does the `exit(1)` too to reduce duplication.
> 
> Also, LLVM style guide says errors have no full stops and no leading capitalization.
I added a fatalError(Error err) function.

================
Comment at: llvm/tools/llvm-elfabi/llvm-elfabi.cpp:48
+           cl::desc("Manually set the DT_SONAME entry of any emitted files"),
+           cl::value_desc("name"));
+cl::opt<ELFTarget> BinaryOutputTarget(
----------------
jhenderson wrote:
> grimar wrote:
> > haowei wrote:
> > > grimar wrote:
> > > > All above are unrelated formatting changes I think.
> > > Yes. I just run clang-format on entire file. Can I keep them?
> > It is better to avoid unnecessarily changes in patches. These changes are unrealated and the right way
> > to do this would be to format the entry file first and commit as NFC, that should not need a review in this case.
> As llvm-elfabi is a new tool, it would be better to do the reformatting early as @grimar suggests, rather than let things linger. That will make everybody's lives easier as you'll just be able to reformat the whole file when adding new code (and only the new bits will change).
It's reformatted in https://github.com/llvm/llvm-project/commit/ad0da312c05d5264fac71dfe417d8a1e117b6a43

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61767/new/

https://reviews.llvm.org/D61767