[PATCH] D90614: [lld-macho] Handle paired relocs

Wed Dec 16 18:07:54 PST 2020

int3 accepted this revision.
int3 added a comment.
This revision is now accepted and ready to land.

> Nit: Commit title seems to indicate that we are adding support for paired relocs in this diff, but it seems like this diff is just paving the way for adding that support. Might be worth rephrasing... could also tag with [nfc] to make it explicit.

^ do remember to address that, as well as the const ref stuff. Otherwise lgtm

================
Comment at: lld/MachO/Arch/X86_64.cpp:73

-uint64_t X86_64::getImplicitAddend(MemoryBufferRef mb, const section_64 &sec,
-                                   const relocation_info &rel) const {
+bool X86_64::isPairedReloc(const relocation_info rel) const {
+  return (rel.r_type == X86_64_RELOC_SUBTRACTOR);
----------------
int3 wrote:
> pass by const ref?
doesn't look like the const ref comment was addressed

================
Comment at: lld/MachO/InputFiles.cpp:227-236
+    // Note: X86 does not use *_RELOC_ADDEND because it can embed an
+    // addend into the instruction stream. On X86, a relocatable address
+    // field always occupies an entire contiguous sequence of byte(s),
+    // so there is no need to merge opcode bits with address
+    // bits. Therefore, it's easy and convenient to store addends in the
+    // instruction-stream bytes that would otherwise contain zeroes. By
+    // contrast, RISC ISAs such as ARM64 mix opcode bits with with
----------------
love the explanation!

================
Comment at: lld/MachO/InputFiles.cpp:250
+    r.offset = relInfo.r_address;
+    // For unpaired relocs, pairdInfo (just a copy of relInfo) is ignored
+    uint64_t rawAddend = target->getAddend(mb, sec, relInfo, pairedInfo);
----------------

================
Comment at: lld/MachO/InputFiles.cpp:196
     } else {
-      if (relInfo.r_symbolnum == 0 || relInfo.r_symbolnum > subsections.size())
-        fatal("invalid section index in relocation for offset " +
-              std::to_string(r.offset) + " in section " + sec.sectname +
-              " of " + getName());
-
+      assert(relInfo.r_symbolnum && relInfo.r_symbolnum <= subsections.size());
       SubsectionMap &referentSubsecMap = subsections[relInfo.r_symbolnum - 1];
----------------
thakis wrote:
> int3 wrote:
> > gkm wrote:
> > > compnerd wrote:
> > > > This really isn't equivalent.  The `assert` may be compiled out, while the previous behaviour would have been preserved and reported an error to the user.
> > > I changed it to an assertion on the grounds that it is an internal "can't happen" error, rather than something a user might induce with malformed input.
> > A fuzzer could generate a file to trigger this error though, right?
> > 
> > (As an aside, we do need more compact ways of emitting error strings...)
> LLD's philosophy is that it assumes that inputs are generally valid: https://github.com/llvm/llvm-project/blob/main/lld/docs/NewLLD.rst "The current policy is that it is your responsibility to give trustworthy object files. The function is guaranteed to return as long as you do not pass corrupted or malicious object files. A corrupted file could cause a fatal error or SEGV. That being said, you don't need to worry too much about it if you create object files in the usual way and give them to the linker. It is naturally expected to work, or otherwise it's a linker's bug." So crashing on fuzzer input is fine, as long as object files generated by compilers or assemblers work.
oh that's a relief. thanks for pointing it out

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90614/new/

https://reviews.llvm.org/D90614