[PATCH] D61767: [llvm-elfabi] Emit ELF header and string table section

Mon Nov 16 02:07:33 PST 2020

jhenderson added inline comments.

================
Comment at: llvm/lib/InterfaceStub/ELFObjHandler.cpp:561
   }
-
   return createStringError(errc::not_supported, "Unsupported binary format");
 }
----------------

================
Comment at: llvm/lib/InterfaceStub/ELFObjHandler.cpp:153
+    fillStrTabShdr(ShStrTab);
+    // Initilize ELFheader
+    initELFHeader<ELFT>(ElfHeader, Stub.Arch);
----------------
jhenderson wrote:
> I suggest using a generic term rather than the variable name to avoid rotting comments should variables get renamed.
This comment still has most of the typos/grammar problems I fixed. Please review my suggested edit and fix them.

================
Comment at: llvm/lib/InterfaceStub/ELFObjHandler.cpp:167
+    write(Data, ElfHeader);
+    ShStrTab.Content.write(Data + ShStrTab.Shdr.sh_offset);
+    StrTab.Content.write(Data + StrTab.Shdr.sh_offset);
----------------
haowei wrote:
> jhenderson wrote:
> > Here and below, would a loop over a list of Sections make more sense going forwards, like you already do above? I don't know if there are expected to be more sections added in the future, but if there are, such a loop would be more robust than having to remember to add everything in multiple places.
> Using loop would make more sense but it seems to be infeasible in this case. The "Section.Content.write" function is responsible for writing data to the MemoryBuffer but this function is not a virtual function derived by different classes (there are SymTab and Dynamic sections builder in D89432) so it would require some data structure changes to make it happen. Or if there is anything like "instanceof" that I can check and cast the types safely at run time, that would also work.
> 
>  As shown in D89432, we only need to write 4 sections in a stub elf file. So it is not very cost efficient to invent some new classes just to make the loop happen. If you have better solutions please let me know.
FWIW, LLVM has a dyn_cast still mechanism that allows some degree of runtime type checking, but it only works within a class heirarchy (see any class that has the classof function, and how that works).

It looks like it would be straightforward to unify the section types somehow, with a virtual write function, but without actually trying it, I'm not certain it would be.

================
Comment at: llvm/lib/InterfaceStub/ELFObjHandler.cpp:179
+  template <class T> static void write(uint8_t *Data, const T &Value) {
+    *reinterpret_cast<T *>(Data) = Value;
+  }
----------------
haowei wrote:
> grimar wrote:
> > jhenderson wrote:
> > > grimar wrote:
> > > > jhenderson wrote:
> > > > > This line makes me very nervous that this code will fail miserably where the host endianness doesn't match the target endianness with non-single byte `T` types.
> > > > It is only used for writing `Elf_Shdr`, `Elf_Ehdr` etc types, what I think should work fine on any hosts.
> > > > Is your concern that somebody might try to use `write` for regular types?
> > > We're probably talking past each other, or I might be forgetting something, but the fields within `Elf_Shdr`, `Elf_Ehdr` etc will be in little/big endian order right depending on the host, right? This code here is essentially just doing a byte-wise copy, by my understanding.
> > My understanding is the following:
> > 
> > E.g. `Elf_Shdr consists of `Elf_Word`/`Elf_Addr`/`Elf_Off` fields,
> > which are of `support::detail::packed_endian_specific_integral` type:
> > 
> > 
> > ```
> > using Elf_Shdr_Base<ELFT>::sh_size;
> >   
> > template <endianness TargetEndianness>
> > struct Elf_Shdr_Base<ELFType<TargetEndianness, false>> {
> >   LLVM_ELF_IMPORT_TYPES(TargetEndianness, false)
> >   Elf_Word sh_name;      // Section name (index into string table)
> >   Elf_Word sh_type;      // Section type (SHT_*)
> >   Elf_Word sh_flags;     // Section flags (SHF_*)
> >   Elf_Addr sh_addr;      // Address where section is to be loaded
> >   Elf_Off sh_offset;     // File offset of section data, in bytes
> >   Elf_Word sh_size;      // Size of section, in bytes
> >   Elf_Word sh_link;      // Section type-specific header table index link
> >   Elf_Word sh_info;      // Section type-specific extra information
> >   Elf_Word sh_addralign; // Section address alignment
> >   Elf_Word sh_entsize;   // Size of records contained within the section
> > };
> > 
> > #define LLVM_ELF_IMPORT_TYPES_ELFT(ELFT)                                       \
> >   using Elf_Addr = typename ELFT::Addr;                                        \
> >   using Elf_Off = typename ELFT::Off;                                          \
> >   using Elf_Half = typename ELFT::Half;                                        \
> >   using Elf_Word = typename ELFT::Word;                                        \
> >   using Elf_Sword = typename ELFT::Sword;                                      \
> >   using Elf_Xword = typename ELFT::Xword;                                      \
> >   using Elf_Sxword = typename ELFT::Sxword;
> > 
> >   using Half = packed<uint16_t>;
> >   using Word = packed<uint32_t>;
> >   using Sword = packed<int32_t>;
> >   using Xword = packed<uint64_t>;
> >   using Sxword = packed<int64_t>;
> >   using Addr = packed<uint>;
> >   using Off = packed<uint>;
> >  
> >   template <typename Ty>
> >   using packed = support::detail::packed_endian_specific_integral<Ty, E, 1>;
> > ```
> >  
> > This type respects endianess on assign:
> > 
> > 
> > ```
> > struct packed_endian_specific_integral {
> > ....
> >   void operator=(value_type newValue) {
> >     endian::write<value_type, endian, alignment>(
> >       (void*)Value.buffer, newValue);
> >   }
> > 
> > /// Write a value to memory with a particular endianness.
> > template <typename value_type, std::size_t alignment>
> > inline void write(void *memory, value_type value, endianness endian) {
> >   value = byte_swap<value_type>(value, endian);
> >   memcpy(LLVM_ASSUME_ALIGNED(
> >              memory, (detail::PickAlignment<value_type, alignment>::value)),
> >          &value, sizeof(value_type));
> > }
> > 
> > template <typename value_type>
> > inline value_type byte_swap(value_type value, endianness endian) {
> >   if ((endian != native) && (endian != system_endianness()))
> >     sys::swapByteOrder(value);
> >   return value;
> > }
> > ```
> Test does not fail when writing elfheader in be or le. If you have better solution to make it more robust, please let me know.
It's fine. I forgot that the Elf_* types had endianness magic built into them.

================
Comment at: llvm/test/tools/llvm-elfabi/missing-or-invalid-output-target.test:1
+# RUN: not llvm-elfabi %s %t 2>&1 | FileCheck %s --check-prefix=MISSING
+# RUN: not llvm-elfabi %s --output-target=nope %t 2>&1 | FileCheck %s --check-prefix=INVALID
----------------
Add descriptive comment at top of test. This should be in place for all tests.

I'd also consider renaming it `output-target-error.test`, which is a little more concise.

================
Comment at: llvm/test/tools/llvm-elfabi/missing-or-invalid-output-target.test:14
+# INVALID: llvm-elfabi: for the --output-target option: Cannot find option named 'nope'!
\ No newline at end of file

----------------
Nit: no new line at EOF.

================
Comment at: llvm/test/tools/llvm-elfabi/write-elf32be-stub.test:1
+# Test writing 32bit big endinan stub elf with minimal sections.
+
----------------
Use '##' for comments, to make them stand out from other lit and FileCheck commands. This applies to all new comments you've added in this and other tests.

================
Comment at: llvm/test/tools/llvm-elfabi/write-elf32be-stub.test:4-7
+# RUN: llvm-readobj -h %t | FileCheck %s --check-prefix=ELFHEADER
+# RUN: llvm-readobj -S %t | FileCheck %s --check-prefix=SECTIONS
+# RUN: llvm-readobj --string-dump .shstrtab %t | FileCheck %s --check-prefix=SHSTRTAB
+# RUN: llvm-readobj --string-dump .dynstr %t | FileCheck %s --check-prefix=DYNSTR
----------------
These can all be combined into a single llvm-readobj execution, without the need for separate prefixes.

================
Comment at: llvm/test/tools/llvm-elfabi/write-elf32le-stub.test:1
+# Test writing 32bit little endinan stub elf with minimal sections.
+
----------------
It might make more sense to merge the ELF32/ELF64 and LE/BE tests all into a single test. Your test would look something like:

```
# RUN: llvm-elfabi %s --output-target=elf32-little %t32-little
# RUN: llvm-readobj -h -S --string-dump .shstrtab --string-dump .dynstr %t32-little | FileCheck %s -DCLASS="32-bit (0x1)" <other defines here for the other bits that are slightly different>
# RUN: llvm-elfabi %s --output-target=elf32-big %t32-b ig
# RUN: llvm-readobj -h -S --string-dump .shstrtab --string-dump .dynstr %t32-big | FileCheck %s -DCLASS="32-bit (0x1)" <other defines here for the other bits that are slightly different>
<same for 64-bit variants>
```

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61767/new/

https://reviews.llvm.org/D61767