[PATCH] D128968: [JITLink][COFF] Initial COFF support.

Lang Hames via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Jul 8 15:05:08 PDT 2022


lhames added a comment.

Really great work -- a fully separated `COFFLinkGraphBuilder` and `x86-64` backend, test infrastructure hooked up and tests already written!

This is a big step forward for ORC Windows support.

Comments below.



================
Comment at: llvm/lib/ExecutionEngine/JITLink/COFFLinkGraphBuilder.cpp:97-99
+    if (!Sec) {
+      return Sec.takeError();
+    }
----------------
No braces needed here.


================
Comment at: llvm/lib/ExecutionEngine/JITLink/COFFLinkGraphBuilder.cpp:123
+    // Get the section's memory protection flags.
+    MemProt Prot = {};
+    if ((*Sec)->Characteristics & COFF::IMAGE_SCN_MEM_EXECUTE)
----------------
`MemProt Prot;` will default-initialize -- no need to be explicit here. 


================
Comment at: llvm/lib/ExecutionEngine/JITLink/COFFLinkGraphBuilder.cpp:145
+    else {
+      ArrayRef<uint8_t> Data{};
+      if (auto Err = Obj.getSectionContents(*Sec, Data))
----------------
No need for the brace initializers -- `ArrayRef<T>` will default initialize.


================
Comment at: llvm/lib/ExecutionEngine/JITLink/COFFLinkGraphBuilder.cpp:184
+    // Get section name
+    StringRef SectionName;
+    if (SectionIndex == COFF::IMAGE_SYM_UNDEFINED) {
----------------
Is `SectionName` only used for debug logging? If so, could this be refactored into a separate function (e.g. as below)? That way it will naturally compile out when we build with `NDEBUG`.

```
std::string getSectionName(COFFSectionIndex SectionIndex, const object::coff_section *Sec) {
StringRef SectionName;
    if (SectionIndex == COFF::IMAGE_SYM_UNDEFINED) {
      if (Sym->getValue())
        return "(common)";
      else
        return "(external)";
    } else if (SectionIndex == COFF::IMAGE_SYM_ABSOLUTE)
      return "(absolute)";
    else if (SectionIndex == COFF::IMAGE_SYM_DEBUG)
      return "(debug)"; // Used with .file symbol
    else {
      // Non reserved regular section number
      if (Sec) {
        if (Expected<StringRef> SecNameOrErr = Obj.getSectionName(*SecOrErr))
          return *SecNameOrErr;
        else
          return "(section name unavailable due to error: " + toString(SecNameOrErr.takeError()) + ")";
      }
      return "(section unspecified)"
    }
}

...
    COFFSectionIndex SectionIndex = Sym->getSectionNumber();
    const object::coff_section *Sec = nullptr;

    switch (SectionIndex) {
      case COFF::IMAGE_SYM_UNDEFINED:
      case COFF::IMAGE_SYM_ABSOLUTE:
      case COFF::IMAGE_SYM_DEBUG:
        break;
      default: {
        Expected<const object::coff_section *> SecOrErr =
          Obj.getSection(SectionIndex);
        if (auto SecOrErr = Obj.getSection(SectionIndex)))
          Sec = *SecOrErr;
        else
          return make_error<JITLinkError>("Invalid COFF section number:" +
                                        formatv("{0:d}: ", SectionIndex) +
                                        " (" + toString(SecOrErr.takeError()) + ")");
    }
...
      dbgs() << "    " << SymIndex << ": Skipping FileRecord symbol \""
             << SymbolName << "\" in " << getSectionName(SectionIndex, Sec)
             << " (index: " << SectionIndex << ") \n";
      });
...
```


================
Comment at: llvm/lib/ExecutionEngine/JITLink/COFFLinkGraphBuilder.cpp:199-201
+      if (SecOrErr.takeError())
+        return make_error<JITLinkError>("Invalid COFF section number:" +
+                                        formatv("{0:d}: ", SectionIndex));
----------------
```
      if (SecOrErr.takeError())
        return make_error<JITLinkError>("Invalid COFF section number:" +
                                        formatv("{0:d}: ", SectionIndex));
```

This idiom is unsafe: The `.takeError()` method will return an `llvm::Error`, and in the success case the fact that you're implicitly converting to bool will count as checking the error, but in the failure case you're not returning it, but instead creating a new Error.

Whenever you check an error you want it to be a named value, because you'll need to do something with that named value in the failure case.

Two alternative approaches are:

(1) Return the error directly (easy, maintains error type, but doesn't give context to the error):
```
  if (auto Err = SecOrErr.takeError())
    return std::move(Err);
```
(2) Build a new error (loses original type, but provides more context for error messages)
```
  if (auto Err = SecOrErr.takeError())
    return make_error<JITLinkError>("Invalid COFF section number:" +
                                    formatv("{0:d}: ", SectionIndex) +
                                    " (" + toString(std::move(Err)) + ")");      
```



================
Comment at: llvm/lib/ExecutionEngine/JITLink/COFFLinkGraphBuilder.cpp:230
+             "IMAGE_WEAK_EXTERN_SEARCH_NOLIBRARY is not supported.");
+      WeakAliasRequests.push_back({SymIndex, TagIndex, SymbolName});
+    } else {
----------------
I'm not very familiar with COFF, but just took a look at https://docs.microsoft.com/en-us/windows/win32/debug/pe-format#auxiliary-format-3-weak-externals.

It sounds like `IMAGE_SYM_CLASS_WEAK_EXTERNAL` behaves differently depending on the value of the `Characteristics` field:

* "`IMAGE_WEAK_EXTERN_SEARCH_NOLIBRARY` indicates that no library search for sym1 should be performed". That sounds a bit vague, but additional conversation (e.g. [[ https://sourceforge.net/p/mingw/mailman/mingw-users/thread/000a01c5a5e8%24b0737700%24de6d65da%40DANNY/ | here ]]) helps to clarify:
> The SVR4 ABI uses (and bfd currently supports)
> the equivalent of IMAGE_WEAK_EXTERN_SEARCH_NOLIBRARY semantics (a
> library member does not resolve a weak symbol unless a strong symbol
> cause the member to be linked in)

Some thoughts off the top of my head:

ORC can't currently support this. ORC lookups have the related concept of a `LookupKind` (static linking context, or a dynamic lookup context) which we use to decide whether to pull in definitions from static archives: -- static links pull in archive defs, dynamic lookups don't.

This would be something new: "don't pull this symbol in at all, but bind it if it's already present", and it's a per-symbol behavior, rather than per-lookup.

We could support this by adding a new "ReallyWeakLinkage" type to the LinkGraph, and `SymbolLookupFlags::ReallyWeaklyReferencedSymbol` value. 
 
It's worth filing a GitHub issue for this, but for now I think this patch is right to treat it as unsupported and return an error.

* "`IMAGE_WEAK_EXTERN_SEARCH_LIBRARY` indicates that a library search for sym1 should be performed". We can probably implement this as a plain weak external symbol in the LinkGraph. This should be added to COFFLinkGraphBuilder in this patch or a follow-up.

* "`IMAGE_WEAK_EXTERN_SEARCH_ALIAS` indicates that sym1 is an alias for sym2". We should add an explicit symbol alias representation to the LinkGraph -- MachO and ELF both need it too (though actual use is rare). For now the approach used in this patch is good.


================
Comment at: llvm/lib/ExecutionEngine/JITLink/COFFLinkGraphBuilder.cpp:351-357
+      if (!CandSize)
+        LLVM_DEBUG({
+          dbgs() << "  Empty implicit symbol size generated for the following "
+                    "symbol:"
+                 << "\n"
+                 << "    " << *Symbol << "\n";
+        });
----------------
If `NDEBUG` is set this condition will apply to `Symbol->setSize(CandSize)` below -- is that intended?

If the `if (!CandSize)` check is for debugging purposes only it can be moved inside the `LLVM_DEBUG` macro.


================
Comment at: llvm/lib/ExecutionEngine/JITLink/COFFLinkGraphBuilder.cpp:406
+    if (Definition->Selection == COFF::IMAGE_COMDAT_SELECT_ASSOCIATIVE) {
+      // FIXME: don't dead strip this when parent section is alive
+      return &G->addDefinedSymbol(
----------------
I guess we should be able to implement these semantics in the future with a new `LinkGraph` `Section` flags and a custom dead-stripping pass?


================
Comment at: llvm/lib/ExecutionEngine/JITLink/COFFLinkGraphBuilder.cpp:428-430
+// Since two symbols always come in a specific order, we store the first
+// symbol to ComdatLeaders table and emits it as a external symbol when
+// we encounter the COMDAT symbol.
----------------
Does this mean that the second symbol in the sequence will always follow immediately after the first?
Do we need an array for `ComdatLeaders`, or just the most recent COMDAT leader symbol?


================
Comment at: llvm/lib/ExecutionEngine/JITLink/COFFLinkGraphBuilder.h:67
+  void setGraphBlock(COFFSectionIndex SecIndex, Block *B) {
+    assert(!GraphBlocks[SecIndex] && "Duplicate section at index");
+    GraphBlocks[SecIndex] = B;
----------------
Can we assert `!COFF::isReservedSectionNumber(SecIndex)` here? Or is that already checked elsewhere?


================
Comment at: llvm/lib/ExecutionEngine/JITLink/COFF_x86_64.cpp:47-49
+  enum COFFX86RelocationKind : Edge::Kind {
+    COFFAddr32NB = Edge::FirstRelocation,
+    COFFRel32,
----------------
This shouldn't extend `Edge::Kind` -- they're both enums, but there's a translation step to get from `COFFX86RelocationKind` to a valid `Edge::Kind`. I'd leave this as a plain enum with no explicit type specified. 


================
Comment at: llvm/lib/ExecutionEngine/JITLink/COFF_x86_64.cpp:153
+  /// Return the string name of the given COFF x86_64 edge kind.
+  const char *getCOFFX86RelocationKindName(Edge::Kind R) {
+    switch (R) {
----------------
This should take a `COFFX86RelocationKind`, rather than an `Edge::Kind`.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128968/new/

https://reviews.llvm.org/D128968



More information about the llvm-commits mailing list