[PATCH] [lld][mach-o] binary reader and writer

kledzik at apple.com kledzik at apple.com
Tue Nov 5 14:09:16 PST 2013



================
Comment at: include/lld/ReaderWriter/MachOLinkingContext.h:123
@@ -118,2 +122,3 @@
   void setBundleLoader(StringRef loader) { _bundleLoader = loader; }
+  StringRef dyldPath() { return "/usr/lib/dyld"; }
 
----------------
Shankar Kalpathi Easwaran wrote:
> const. 
fixed

================
Comment at: include/lld/ReaderWriter/MachOLinkingContext.h:129
@@ -123,1 +128,3 @@
   static uint32_t cpuSubtypeFromArch(Arch arch);
+  static bool is64(Arch arch);
+  static bool isOtherEndian(Arch arch);
----------------
Rui Ueyama wrote:
> nit: is64BitArch or is64? Is this inconsistency intended?
One is static.  I've changed them both to be is64Bit.

================
Comment at: lib/ReaderWriter/MachO/MachOLinkingContext.cpp:75-87
@@ -72,13 +74,15 @@
 static ArchInfo archInfos[] = {
-  { "x86_64", MachOLinkingContext::arch_x86_64, llvm::MachO::CPU_TYPE_X86_64,
-    llvm::MachO::CPU_SUBTYPE_X86_64_ALL },
-  { "i386", MachOLinkingContext::arch_x86, llvm::MachO::CPU_TYPE_I386,
-    llvm::MachO::CPU_SUBTYPE_X86_ALL },
-  { "armv6", MachOLinkingContext::arch_armv6, llvm::MachO::CPU_TYPE_ARM,
-    llvm::MachO::CPU_SUBTYPE_ARM_V6 },
-  { "armv7", MachOLinkingContext::arch_armv7, llvm::MachO::CPU_TYPE_ARM,
-    llvm::MachO::CPU_SUBTYPE_ARM_V7 },
-  { "armv7s", MachOLinkingContext::arch_armv7s, llvm::MachO::CPU_TYPE_ARM,
-    llvm::MachO::CPU_SUBTYPE_ARM_V7S },
-  { StringRef(), MachOLinkingContext::arch_unknown, 0, 0 }
+  { "x86_64", MachOLinkingContext::arch_x86_64, true, 
+          llvm::MachO::CPU_TYPE_X86_64, llvm::MachO::CPU_SUBTYPE_X86_64_ALL },
+  { "i386", MachOLinkingContext::arch_x86,  true, 
+              llvm::MachO::CPU_TYPE_I386, llvm::MachO::CPU_SUBTYPE_X86_ALL },
+  { "ppc", MachOLinkingContext::arch_ppc,  false, 
+          llvm::MachO::CPU_TYPE_POWERPC, llvm::MachO::CPU_SUBTYPE_POWERPC_ALL },
+  { "armv6", MachOLinkingContext::arch_armv6,  true, 
+                  llvm::MachO::CPU_TYPE_ARM, llvm::MachO::CPU_SUBTYPE_ARM_V6 },
+  { "armv7", MachOLinkingContext::arch_armv7,  true, 
+                  llvm::MachO::CPU_TYPE_ARM, llvm::MachO::CPU_SUBTYPE_ARM_V7 },
+  { "armv7s", MachOLinkingContext::arch_armv7s,  true, 
+                  llvm::MachO::CPU_TYPE_ARM, llvm::MachO::CPU_SUBTYPE_ARM_V7S },
+  { StringRef(), MachOLinkingContext::arch_unknown, false, 0, 0 }
 };
----------------
Shankar Kalpathi Easwaran wrote:
> kledzik at apple.com wrote:
> > Michael Spencer wrote:
> > > Indentation is weird here.
> > Its a really wide table.  I'll make each row be three lines so things line up.
> Using llvm::MachO can solve a lot of indentation here.
I added the using, and moved the static table to be a member of MachOLinkingContext so I don't need the MachOLinkingContext:: prefix on the arches.  That makes all entries fit on on 80 column line.

================
Comment at: lib/ReaderWriter/MachO/MachOLinkingContext.cpp:91
@@ -86,3 +90,3 @@
 MachOLinkingContext::Arch
 MachOLinkingContext::archFromCpuType(uint32_t cputype, uint32_t cpusubtype) {
   for (ArchInfo *info = archInfos; !info->archName.empty(); ++info) {
----------------
Shankar Kalpathi Easwaran wrote:
> const ?
That method is static.

================
Comment at: lib/ReaderWriter/MachO/MachOLinkingContext.cpp:93-95
@@ -88,5 +92,5 @@
   for (ArchInfo *info = archInfos; !info->archName.empty(); ++info) {
     if ((info->cputype == cputype) && (info->cpusubtype == cpusubtype)) {
       return info->arch;
     }
   }
----------------
Shankar Kalpathi Easwaran wrote:
> single block if statement - no need of a brace.
fixed

================
Comment at: lib/ReaderWriter/MachO/MachOLinkingContext.cpp:101
@@ -96,3 +100,3 @@
 MachOLinkingContext::Arch
 MachOLinkingContext::archFromName(StringRef archName) {
   for (ArchInfo *info = archInfos; !info->archName.empty(); ++info) {
----------------
Shankar Kalpathi Easwaran wrote:
> const ?
method is static

================
Comment at: lib/ReaderWriter/MachO/MachOLinkingContext.cpp:110
@@ -105,3 +109,3 @@
 
 uint32_t MachOLinkingContext::cpuTypeFromArch(Arch arch) {
   assert(arch != arch_unknown);
----------------
Shankar Kalpathi Easwaran wrote:
> const.
I wish C++ allowed 'static' on the definition.

================
Comment at: lib/ReaderWriter/MachO/MachOLinkingContext.cpp:133
@@ -128,3 +132,3 @@
       _doNothing(false), _arch(arch_unknown), _os(OS::macOSX), _osMinVersion(0),
-      _pageZeroSize(0x1000), _compatibilityVersion(0), _currentVersion(0),
+      _pageZeroSize(UINT64_MAX), _compatibilityVersion(0), _currentVersion(0),
       _deadStrippableDylib(false), _kindHandler(nullptr) {}
----------------
Shankar Kalpathi Easwaran wrote:
> Why is _pageZeroSize set to UINT64_MAX ?
I've changed this to be clearer.  It is now initialized to 'unspecifiedPageZeroSize'.  Later, in validate() if the page zero size has not been set, it is set to the arch default.

================
Comment at: lib/ReaderWriter/MachO/MachONormalizedFileBinaryUtils.h:199
@@ +198,3 @@
+bitFieldExtract(uint32_t value, bool isBigEndianBigField, 
+                                        uint8_t firstBit, uint8_t bitCount) {
+  const uint32_t mask = ((1<<bitCount)-1); 
----------------
Rui Ueyama wrote:
> I think it's not in LLVM coding style. Can you run clang-format on this patch?
I just tried running clang-format on this file and all it did was lose the columnar alignment of the right hand sides.

================
Comment at: lib/ReaderWriter/MachO/MachONormalizedFileBinaryWriter.cpp:102-108
@@ +101,9 @@
+
+/// Utility class for writing a mach-o binary file given an in-memory
+/// normalized file.  
+class MachOFileLayout {
+public:
+              /// All layout computation is done in the constructor.
+              MachOFileLayout(const NormalizedFile &file);
+              
+  /// Returns the final file size as computed in the constructor.
----------------
Shankar Kalpathi Easwaran wrote:
> indentation is off here.
The method names all line up.  It is just that constructors don't have return types.

================
Comment at: lib/ReaderWriter/MachO/MachONormalizedFileBinaryWriter.cpp:110
@@ +109,3 @@
+  /// Returns the final file size as computed in the constructor.
+  size_t      size();
+
----------------
Shankar Kalpathi Easwaran wrote:
> const ?
Sure.  There is nothing const in this class.

================
Comment at: lib/ReaderWriter/MachO/MachONormalizedFileBinaryWriter.cpp:220-222
@@ +219,5 @@
+
+StringRef MachOFileLayout::dyldPath() {
+  return "/usr/lib/dyld";
+}
+
----------------
Shankar Kalpathi Easwaran wrote:
> looks like function is duplicated here, you could as well get the info from the context.
There is no LinkingContext here.  There is an obscure linker option to change the loader used by a program, but no one uses that option.  If I ever add support for that option in lld, I'll need to wire the two together somehow.

================
Comment at: lib/ReaderWriter/MachO/MachONormalizedFileBinaryWriter.cpp:224-229
@@ +223,8 @@
+
+uint32_t MachOFileLayout::pointerAlign(uint32_t value) {
+  if (_is64)
+    return (value + 7) & -8;
+  else
+    return (value + 3) & -4;
+}
+
----------------
Shankar Kalpathi Easwaran wrote:
> use MathExtras. llvm::RoundUpToAlignment.
Thanks.  Much simpler:
     return llvm::RoundUpToAlignment(value, _is64 ? 8 : 4);


================
Comment at: lib/ReaderWriter/MachO/MachONormalizedFileBinaryWriter.cpp:241-268
@@ +240,30 @@
+
+void MachOFileLayout::ByteBuffer::append_uleb128(uint64_t value) {
+  uint8_t byte;
+  do {
+    byte = value & 0x7F;
+    value &= ~0x7F;
+    if ( value != 0 )
+      byte |= 0x80;
+    _bytes.push_back(byte);
+    value = value >> 7;
+  } while( byte >= 0x80 );
+}
+
+void MachOFileLayout::ByteBuffer::append_sleb128(int64_t value) {
+  bool isNeg = ( value < 0 );
+  uint8_t byte;
+  bool more;
+  do {
+    byte = value & 0x7F;
+    value = value >> 7;
+    if ( isNeg ) 
+      more = ( (value != -1) || ((byte & 0x40) == 0) );
+    else
+      more = ( (value != 0) || ((byte & 0x40) != 0) );
+    if ( more )
+      byte |= 0x80;
+    _bytes.push_back(byte);
+  } 
+  while( more );
+}
----------------
Shankar Kalpathi Easwaran wrote:
> Looks like this has been moved from WriterMachO.cpp ? Do you think it might be good to move this to to llvm/Support ? DWARF also has these types isnt it ? 
llvm/Support does have U/SLEB128 support.  But it is based on raw_ostream.  There are adapters like raw_svector_ostream which are very similar to ByteBuffer, but use signed char instead of uint8_t.  It probably works the same, but I worry that some sign extension will slip in and mess up the LEB128.

================
Comment at: lib/ReaderWriter/MachO/MachONormalizedFileBinaryWriter.cpp:276-279
@@ +275,6 @@
+
+void MachOFileLayout::ByteBuffer::align(unsigned alignment) {
+  while ( (_bytes.size() % alignment) != 0 )
+    _bytes.push_back(0);
+}
+
----------------
Shankar Kalpathi Easwaran wrote:
> What about when you want to align text ? Does 0 correspond to nop ?
ByteBuffer is only used for constructing the __LINKEDIT rebase/binding info.  It is not used for section content.

================
Comment at: lib/ReaderWriter/MachO/MachONormalizedFileBinaryWriter.cpp:322-323
@@ +321,4 @@
+    
+    // Align start of relocations.
+    _startOfRelocations = pointerAlign(_endOfSectionsContent);    
+    _startOfSymbols = _startOfRelocations + relocCount * 8;
----------------
Shankar Kalpathi Easwaran wrote:
> This is uninitialized in the else part. I believe these are only relevant to object files, but still worried on a lot of uninitialized stuff that may just fall through.
Yes.  Unfortunately, mach-o for object files and final linked images has many differences.  I'll look into splitting this up into three classes: the base class with stuff common to object and FLI, a subclass just for object files, and a subclass just for FLI.

================
Comment at: lib/ReaderWriter/MachO/MachONormalizedFileBinaryWriter.cpp:456-461
@@ +455,8 @@
+  // Verify no segments overlap
+  for (const Segment &sg1 : _file.segments) {
+    for (const Segment &sg2 : _file.segments) {
+      if (&sg1 == &sg2)
+        continue;
+      if (overlaps(sg1,sg2)) {
+        _ec = llvm::make_error_code(llvm::errc::executable_format_error);
+        return;
----------------
Shankar Kalpathi Easwaran wrote:
> O(n^2) a sort of segments by their address might help. This part could be moved to verify ?
There is usually just two segments, so I'm not worried about performance here

================
Comment at: lib/ReaderWriter/MachO/MachONormalizedFileBinaryWriter.cpp:460-463
@@ +459,6 @@
+        continue;
+      if (overlaps(sg1,sg2)) {
+        _ec = llvm::make_error_code(llvm::errc::executable_format_error);
+        return;
+      }
+    }
----------------
Shankar Kalpathi Easwaran wrote:
> should this function return error_code instead ?
All this happens in the constructor of MachOFileLayout.  If any problems are encountered _ec is set and later writeBinary() checks _ec before actually writing.    What should be improved is that an early error should stop later processing, otherwise it will be operating on incomplete data.

================
Comment at: lib/ReaderWriter/MachO/MachONormalizedFileBinaryWriter.cpp:507
@@ +506,3 @@
+  for (const Segment &sg : _file.segments) {
+    _segInfo[&sg].fileOffset = (fileOffset + 4095) & (-4096);
+    if ((_seg1addr == INT64_MAX) && sg.access)
----------------
Shankar Kalpathi Easwaran wrote:
> use llvm::RoundUpToAlignment. May be you can add a function to LinkingContext pageSize().
I've switched the code to use RoundUpToAlignment().  The LinkingContext and pageSize() is not available here.  The code with created the normalize file (which does have access to LinkingContext) should have already taken pageSize() into account when creating the segments.  But, yes, 4096 should not be hard coded here.  I should somehow be infered from the NormalizedMachOFile.

================
Comment at: lib/ReaderWriter/MachO/MachONormalizedFileBinaryWriter.cpp:526
@@ +525,3 @@
+
+size_t MachOFileLayout::size() {
+  return _endOfSymbolStrings;
----------------
Shankar Kalpathi Easwaran wrote:
> const.
fixed

================
Comment at: lib/ReaderWriter/MachO/MachONormalizedFileBinaryWriter.cpp:530
@@ +529,3 @@
+
+void MachOFileLayout::writeMachHeader() {
+  mach_header *mh = reinterpret_cast<mach_header*>(_buffer);
----------------
Shankar Kalpathi Easwaran wrote:
> writeMachOHeader() ?
Believe it or not the struct for the start of mach-o files is called "mach_header".

================
Comment at: lib/ReaderWriter/MachO/MachONormalizedFileBinaryWriter.cpp:543-544
@@ +542,4 @@
+
+uint32_t MachOFileLayout::indirectSymbolIndex(const Section &sect, 
+                                                            uint32_t &index) {
+  if (sect.indirectSymbols.empty())
----------------
Shankar Kalpathi Easwaran wrote:
> alignment.
fixed

================
Comment at: lib/ReaderWriter/MachO/MachONormalizedFileBinaryWriter.cpp:571-572
@@ +570,4 @@
+  seg->filesize = seg->vmsize;
+  seg->maxprot = 7;
+  seg->initprot = 7;
+  seg->nsects = _file.sections.size();
----------------
Shankar Kalpathi Easwaran wrote:
> I think these are the initial protection flags for the segment right ? Why are they set to RWX (I thought 7 is coming from there).
Fixed to be VM_PROT_READ|VM_PROT_WRITE|VM_PROT_EXECUTE

================
Comment at: lib/ReaderWriter/MachO/MachONormalizedFileBinaryWriter.cpp:648
@@ +647,3 @@
+
+error_code MachOFileLayout::writeSegment64LoadCommands(uint8_t *&lc) {
+  uint32_t indirectSymRunningIndex = 0;
----------------
Shankar Kalpathi Easwaran wrote:
> can writeSegment64LoadCommands and writeSegment32LoadCommands be combined ?
I hate this duplication of code too.  The problem is there is a whole bunch of types to switch: segment_command_64, section_64, LC_SEGMENT_64, etc.  I'll add a FIXME to see if these can be combined via a template that takes one of two classes which define all the types for each case. 

================
Comment at: lib/ReaderWriter/MachO/MachONormalizedFileBinaryWriter.cpp:1079-1083
@@ +1078,7 @@
+
+void MachOFileLayout::writeLinkEditContent() {
+  if (_file.fileType == llvm::MachO::MH_OBJECT) {
+    writeRelocations();
+    writeSymbolTable();
+  }
+  else {
----------------
Shankar Kalpathi Easwaran wrote:
> wow looks like this patch handles partial linking too ?
Normalized mach-o supports object files and FLI.  Where I knew there was a difference, I handled the two cases.  But the object file write side has little testing.  Once I get more infrastructure in place, I can get start adding tests for it.


================
Comment at: lib/ReaderWriter/MachO/MachONormalizedFileFromAtoms.cpp:1
@@ +1,2 @@
+//===- lib/ReaderWriter/MachO/MachONormalizedFileFromAtoms.cpp ------------===//
+//
----------------
Rui Ueyama wrote:
> Hmm, it's hard to tell which code is just moved from MachOWriter and which is newly written.
Its all new.  MachOWriter went directly from Atoms to mach-o binary.  There is now two steps and goes through the intermediate normalized mach-o.  This file does the step from Atoms to normalized mach-o. 

================
Comment at: lib/ReaderWriter/MachO/MachONormalizedFileFromAtoms.cpp:231-251
@@ +230,23 @@
+
+void Util::appendAtom(SectionInfo *sect, const DefinedAtom *atom) {
+  // Figure out offset for atom in this section given alignment constraints.
+  uint64_t offset = sect->size;
+  DefinedAtom::Alignment atomAlign = atom->alignment();
+  uint64_t align2 = 1 << atomAlign.powerOf2;
+  uint64_t requiredModulus = atomAlign.modulus;
+  uint64_t currentModulus = (offset % align2);
+  if ( currentModulus != requiredModulus ) {
+    if ( requiredModulus > currentModulus )
+      offset += requiredModulus-currentModulus;
+    else
+      offset += align2+requiredModulus-currentModulus;
+  }
+  // Record max alignment of any atom in this section.
+  if ( atomAlign.powerOf2 > sect->alignment )
+    sect->alignment = atomAlign.powerOf2;
+  // Assign atom to this section with this offset.
+  AtomInfo ai = {atom, offset};
+  sect->atomsAndOffsets.push_back(ai);
+  // Update section size to include this atom.
+  sect->size = offset + atom->size();
+}
----------------
Shankar Kalpathi Easwaran wrote:
> looks like even BSS atoms take up file space ? Is that something to be fixed later ?
I added a FIXME to appendSection() to handle the BSS optimization  

================
Comment at: lib/ReaderWriter/MachO/MachONormalizedFileFromAtoms.cpp:346-349
@@ +345,6 @@
+
+uint64_t Util::alignTo(uint64_t value, uint8_t align2) {
+  uint64_t align = 1 << align2;
+  return ( (value + (align-1)) & (-align) );
+}
+
----------------
Shankar Kalpathi Easwaran wrote:
> llvm::RoundUpToAlignment.
Fixed to use llvm::RoundUpToAlignment().

================
Comment at: lib/ReaderWriter/MachO/MachONormalizedFileFromAtoms.cpp:358
@@ +357,3 @@
+  }
+  seg->size = alignTo(addr - seg->address, 12);
+}
----------------
Shankar Kalpathi Easwaran wrote:
> Align at 12 byte boundaries ?
alignTo() takes a power of two.  So the 12 means page align (4096).  But this should not be hard coded.  I've fixed it to get the page size from the LinkingContext.

================
Comment at: lib/ReaderWriter/MachO/MachONormalizedFileFromAtoms.cpp:374
@@ +373,3 @@
+  while (padding < 0)
+    padding += 4096;
+  // Start assigning section address starting at padded offset.
----------------
Shankar Kalpathi Easwaran wrote:
> Do you want to use the context pagesize function here.
fixed


http://llvm-reviews.chandlerc.com/D2101

BRANCH
  svn

ARCANIST PROJECT
  lld



More information about the llvm-commits mailing list