[PATCH] D60270: [llvm-objcopy] Add support for Intel HEX input/output format

Wed Apr 24 10:42:33 PDT 2019

evgeny777 marked 9 inline comments as done.
evgeny777 added a comment.

> There's a lot of code to review here.

I've responded to some of the comments, meanwhile I'm splitting patch into writer (will go first) and reader (will go next). Will update the review soon

================
Comment at: tools/llvm-objcopy/ELF/Object.cpp:157
+
+template <class T> static T checkedGetHex(StringRef S) {
+  T Value;
----------------
jakehehrlich wrote:
> Unless an check that generates an error always proceeds this I think its best to return an error in this case, not assert fail. It would be better to roll this into an Expected function in that case I think anyway.
This function can't actually return error, because string has been previously validated (see `checkChars` for example). IMO, it's bad practice to implement runtime checks for one's own logical errors.

================
Comment at: tools/llvm-objcopy/ELF/Object.cpp:197
+  *Iter++ = ':';
+  Iter = utohexstr(Data.size(), Iter, 2);
+  Iter = utohexstr(Addr, Iter, 4);
----------------
jakehehrlich wrote:
> Maybe a raw_ostream would be useful here. We've generally avoided them but this format seems to lend itself to streams where as my opinion was the opposite before. You wouldn't need utohexstr since those formatting options are already supplied by the library I believe.
This function was optimized to not using any dynamic allocation (IHexLineData is actually SmallVector), because each line contains only 16 bytes of section data, so it's possible to have really huge number of lines. What are the benefits of using raw_ostream?

================
Comment at: tools/llvm-objcopy/ELF/Object.cpp:298
+
+static uint64_t sectionPhysicalAddr(const SectionBase *Sec) {
+  Segment *Seg = Sec->ParentSegment;
----------------
jakehehrlich wrote:
> Does this ever make sense if there is no segment?
It's a helper function which returns section VA if there is no segment. Any suggestion for better name?

================
Comment at: tools/llvm-objcopy/ELF/Object.cpp:310
+  const uint32_t ChunkSize = 16;
+  uint32_t Addr = sectionPhysicalAddr(Sec) & 0xFFFFFFFFU;
+  while (!Data.empty()) {
----------------
jakehehrlich wrote:
> Masking that like this seems redundant, in general the number of places we're converting from 64 to 32 in an unchecked way is really shocking. I'd feel a lot more comfortable if we encapsulated these checks more and made them more clear.
There is a checkSections method which check all sections to detect if any of them has 64-bit address. Bear in mind that implementation also supports sign extended 32-bit addresses, i.e 0xFFFFFFFF80000000 is a valid address, but 0x100000000 is not

================
Comment at: tools/llvm-objcopy/ELF/Object.cpp:313
+    uint64_t DataSize = std::min<uint64_t>(Data.size(), ChunkSize);
+    if (Addr > SegmentAddr + BaseAddr + 0xFFFFU) {
+      if (Addr > 0xFFFFFU) {
----------------
jakehehrlich wrote:
> Maybe we could split support for extended records out into a sperate patch and error out here for now?
I suggest splitting reader and writer. To me it looks like a more logical split compared to removal of certain record types.

================
Comment at: tools/llvm-objcopy/ELF/Object.h:266
+// calculation in IHexWriter::finalize.
+class IHexSectionWriterBase : public BinarySectionWriter {
+  // 20-bit segment address
----------------
jakehehrlich wrote:
> What's the point in splitting this into two classes?
> 
> Also does inheriting from BinarySectionWriter make sense? The same visitors will need to be implemented I would suppose the offsets and everything would be very different.
I inherited IHexSectionWriter from BinarySectionWriter in order to reuse visitors for RelocationSection, GnuDebugLinkSection, e.t.c which will never go to IHEX nor to binary output. Are you suggesting duplication?

================
Comment at: tools/llvm-objcopy/ELF/Object.h:296
+// Real IHEX section writer
+class IHexSectionWriter : public IHexSectionWriterBase {
+public:
----------------
jakehehrlich wrote:
> In general I think it might be worth considering weather there is a need to use sections at all. Originally with the binary writer we only used program headers. It turns out that people did a lot of stuff in a really odd way with GNU objcopy when using -O binary that required that we use sections as the primary basis for output. I would imagine that ihex users would not be doing the same sorts of odd tricks and that you could write the output strait from program headers. This would simplify the implementation greatly I think and harden the implementation against all sorts of odd corner cases.
AFAIU one can't do this with IHEX, because unlike binary IHEX is not a contiguous blob, e.g you can have a gap between sections which won't go to output file.

================
Comment at: tools/llvm-objcopy/ELF/Object.h:498-507
+  OwnedDataSection(const Twine &SecName, uint64_t SecAddr, uint64_t SecFlags,
+                   uint64_t SecOff) {
+    Name = SecName.str();
+    Type = ELF::SHT_PROGBITS;
+    Addr = SecAddr;
+    Flags = SecFlags;
+    OriginalOffset = SecOff;
----------------
jakehehrlich wrote:
> Why do we need this to output a new format?
This function takes a portion of hexadecimal data from IHexRecord and appends it in binary form to internal vector of OwnedDataSection

================
Comment at: tools/llvm-objcopy/ELF/Object.h:932

+class IHexReader : public Reader {
+  MemoryBuffer *MemBuf;
----------------
jakehehrlich wrote:
> We can probably split this into two changes to make things smaller, one for reading, and one for writing yeah?
I think so.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60270/new/

https://reviews.llvm.org/D60270