[PATCH] D132988: Changes to use a string pool for symbols

Lang Hames via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Oct 7 18:15:52 PDT 2022


lhames added a comment.

I love seeing all that `BlockDependenciesMap` code disappearing as a result of this change. :D

Lots of minor changes, but the broad direction of this is _great_! Once this patch lands the next big target should be `LinkGraph::LookupMap` -- we should be able to ditch the StringRefs and pass the `SymbolStringPtr`s right through to the context. That'll avoid a tonne of re-interning on the `ObjectLinkingLayer` side.

Very cool work!



================
Comment at: llvm/include/llvm/ExecutionEngine/JITLink/JITLink.h:1100
   /// be undefined, in which case they are assigned a value of 0.
   Symbol &addExternalSymbol(StringRef Name, orc::ExecutorAddrDiff Size,
                             Linkage L) {
----------------
This and the other "add" methods can take `SymbolStringPtr`s now, rather than StringRefs.

This is a place where I wouldn't mind a convenience overload for the StringRef though. I.e.:

```
Symbol &addExternalSymbol(SymbolStringPtr Name, orc::ExecutorAddrDiff Size, Linkage L) {
  assert(llvm::count(ExternalSymbols, Name) == 0 && "Duplicate external symbol");
    auto &Sym = Symbol::constructExternal(
        Allocator, createAddressable(orc::ExecutorAddr(), false),
        std::move(Name), Size, L);
    ExternalSymbols.insert(&Sym);
    return Sym;
}

Symbol &addExternalSymbol(StringRef Name, orc::ExecutorAddrDiff Size, Linkage L) {
  return addExternalSymbol(SSP->intern(Name), Size, L);
}
```


================
Comment at: llvm/include/llvm/ExecutionEngine/JITLink/JITLink.h:500
   /// Returns the name of this symbol (empty if the symbol is anonymous).
-  StringRef getName() const {
-    assert((!Name.empty() || getScope() == Scope::Local) &&
+  orc::SymbolStringPtr getName() const {
+    assert((Name || getScope() == Scope::Local) &&
----------------
dblaikie wrote:
> jaredwy wrote:
> > dblaikie wrote:
> > > This could return by const-ref (the same way a large member like a std::vector would return from an accessor by const ref) & then `isNamed` could be removed & the usage code kept the same? (I assume `isNamed` was added to avoid the cost of copying the `SymbolStringPtr` - and if it returns by const ref there's no added cost? Oh, I guess callers still have to add the `*` in to dereference? yeah, but still maybe better than adding this `isNamed` function that's a bit quirky - since it only allows the one operation (equality) without the `SymbolStringPtr` copy, which is a bit niche/special case/inflexible)
> > It was mostly added to aid in readability of the code and avoiding the unnecessary copy but I am not attached to it, I am new to these lands so if it is not the LLVM way to add utility functions i can remove
> Could you show/compare the readability issues you had in mind?
> 
> Would `getName` returning by const-ref address the unnecessary copy concerns?
I'd be in favor of having `getName()` return by const-ref and dropping `isNamed` -- this would be consistent with the existing getters/setters.


================
Comment at: llvm/include/llvm/ExecutionEngine/JITLink/TableManager.h:42
         dbgs() << "    Created" << impl().getSectionName() << "entry for "
-               << Target.getName() << ": " << Entry << "\n";
+               << *(Target.getName()) << ": " << Entry << "\n";
       });
----------------
Are the parens needed here? I think `*` binds tight enough that you can omit them.


================
Comment at: llvm/lib/ExecutionEngine/JITLink/COFFLinkGraphBuilder.cpp:238-239
             *Sym, Sec);
-        GSym = &createDLLImportEntry(SymbolName, *ExternalSym);
+        auto SSPName = G->getSymbolStringPool()->intern(SymbolName);
+        GSym = &createDLLImportEntry(SSPName, *ExternalSym);
       } else
----------------
I'd include `intern` convenience methods on both the `LinkGraph` and the builder, and then use them to avoid naming `SymbolStringPtr`s that only have one use. I.e.
```
Gym = &createDLLImportEntry(intern(SymbolName), *ExternalSym);
```

OTOH it's always worth naming a SymbolStringPtr that gets used more than once (otherwise you'd have to intern it more than once).


================
Comment at: llvm/lib/ExecutionEngine/JITLink/EHFrameSupport.cpp:388
                << ((*PCBegin)->isExternal() ? "external" : "absolute")
-               << " symbol \"" << (*PCBegin)->getName()
+               << " symbol \"" << *((*PCBegin)->getName())
                << "\" -- FDE must be kept alive manually or it will be "
----------------
I think you can drop the outer parens here.


================
Comment at: llvm/lib/ExecutionEngine/JITLink/EHFrameSupport.cpp:541
         if (EdgeI->second.Target->hasName())
-          dbgs() << " (" << EdgeI->second.Target->getName() << ")";
+          dbgs() << " (" << *(EdgeI->second.Target->getName()) << ")";
         dbgs() << "\n";
----------------
I think you can drop them outer parens here too.


================
Comment at: llvm/lib/ExecutionEngine/JITLink/ELF_x86_64.cpp:280
             [&](LinkGraph &LG, Symbol &Sym) -> SectionRangeSymbolDesc {
-              if (Sym.getName() == ELFGOTSymbolName)
+              if (*Sym.getName() == ELFGOTSymbolName)
                 if (auto *GOTSection = G.findSectionByName(
----------------
Could lazily intern `ELFGOTSymbolName` into an `ELFJITLinker_x86_64` member variable to speed up repeat comparisons?

No need to do that for this patch, but maybe add a FIXME to follow up on the idea?


================
Comment at: llvm/lib/ExecutionEngine/JITLink/JITLinkGeneric.cpp:203
            "External has already been assigned an address");
-    assert(Sym->getName() != StringRef() && Sym->getName() != "" &&
+    assert(Sym->hasName() && *Sym->getName() != "" &&
            "Externals must be named");
----------------
I think you can now just `assert(Sym->getName() && "Externals must be named")`.


================
Comment at: llvm/lib/ExecutionEngine/JITLink/JITLinkGeneric.cpp:209
             : SymbolLookupFlags::RequiredSymbol;
-    UnresolvedExternals[Sym->getName()] = LookupFlags;
+    UnresolvedExternals[*Sym->getName()] = LookupFlags;
   }
----------------
We should key `UnresolvedExternals` on the `SymbolStringPtr` now, rather than the StringRef.


================
Comment at: llvm/lib/ExecutionEngine/JITLink/PerGraphGOTAndPLTStubsBuilder.h:73
 
-    auto GOTEntryI = GOTEntries.find(Target.getName());
+    auto GOTEntryI = GOTEntries.find(*(Target.getName()));
 
----------------
I think `GOTEntries` should be able to be keyed on the `SymbolStringPtr` too.


================
Comment at: llvm/lib/ExecutionEngine/JITLink/PerGraphGOTAndPLTStubsBuilder.h:95
            "External branch edge can not point to an anonymous target");
-    auto StubI = PLTStubs.find(Target.getName());
+    auto StubI = PLTStubs.find(*Target.getName());
 
----------------
PLTStubs should likewise be keyed on the `SymbolStringPtr`.


================
Comment at: llvm/lib/ExecutionEngine/Orc/COFFPlatform.cpp:67
     }
-
+    auto SSP = CP.getExecutionSession().getSymbolStringPool();
     auto G = std::make_unique<jitlink::LinkGraph>(
----------------
We need to use the `ExecutionSession`'s `SymbolStringPool` here. I'd update this to take the `ExecutionSession&` -- `COFFPlatform` has an ES member handy that you can pass in.


================
Comment at: llvm/lib/ExecutionEngine/Orc/COFFPlatform.cpp:733
   auto I = llvm::find_if(G.defined_symbols(), [this](jitlink::Symbol *Sym) {
-    return Sym->getName() == *CP.COFFHeaderStartSymbol;
+    return *Sym->getName() == *CP.COFFHeaderStartSymbol;
   });
----------------
Once everyone's sharing the same pool this can be a direct `SymbolStringPtr` value comparison.


================
Comment at: llvm/lib/ExecutionEngine/Orc/ELFNixPlatform.cpp:693
     auto I = llvm::find_if(G.defined_symbols(), [this](jitlink::Symbol *Sym) {
-      return Sym->getName() == *MP.DSOHandleSymbol;
+      return *Sym->getName() == *MP.DSOHandleSymbol;
     });
----------------
Can be a `SSP` comparison once the pool is shared.


================
Comment at: llvm/lib/ExecutionEngine/Orc/ELFNixPlatform.cpp:843-850
+    if (*Sym->getName() == "__tls_get_addr") {
+      auto TLSGetAddr =
+          MP.getExecutionSession().intern("___orc_rt_elfnix_tls_get_addr");
+      Sym->setName(std::move(TLSGetAddr));
+    } else if (*Sym->getName() == "__tlsdesc_resolver") {
+      auto TLSGetAddr =
+          MP.getExecutionSession().intern("___orc_rt_elfnix_tlsdesc_resolver");
----------------
These `tls` symbol strings should be interned and stored in the plugin -- that'll save re-interning them for each graph.


================
Comment at: llvm/lib/ExecutionEngine/Orc/ExecutionUtils.cpp:532-535
     auto NameCopy = G->allocateString(Twine(getImpPrefix()) + *KV.first);
     StringRef NameCopyRef = StringRef(NameCopy.data(), NameCopy.size());
-    Ptr.setName(NameCopyRef);
+    auto SymbolName = G->getSymbolStringPool()->intern(NameCopyRef);
+    Ptr.setName(std::move(SymbolName));
----------------
We shouldn't need to allocate a string on the `LinkGraph` for this any more. Instead, you should just be able to say:
```
Ptr.setName(G->intern(getImplPrefix() + *KV.first));
```


================
Comment at: llvm/lib/ExecutionEngine/Orc/MachOPlatform.cpp:676
   auto I = llvm::find_if(G.defined_symbols(), [this](jitlink::Symbol *Sym) {
-    return Sym->getName() == *MP.MachOHeaderStartSymbol;
+    return *Sym->getName() == *MP.MachOHeaderStartSymbol;
   });
----------------
Can compare SSP values once the pool is shared.


================
Comment at: llvm/lib/ExecutionEngine/Orc/MachOPlatform.cpp:814-817
+    if (*Sym->getName() == "__tlv_bootstrap") {
+      auto TLSGetADDR =
+          MP.getExecutionSession().intern("___orc_rt_macho_tlv_get_addr");
+      Sym->setName(std::move(TLSGetADDR));
----------------
We should intern these `tlv` strings and store them as members in the plugin to avoid redundant interning.


================
Comment at: llvm/lib/ExecutionEngine/Orc/MachOPlatform.cpp:994-997
+    auto SymName = *Sym->getName();
+    if (SymName == "___orc_rt_macho_register_ehframe_section")
       orc_rt_macho_register_ehframe_section = ExecutorAddr(Sym->getAddress());
+    else if (SymName == "___orc_rt_macho_deregister_ehframe_section")
----------------
More strings to be interned and stored in the plugin.


================
Comment at: llvm/lib/ExecutionEngine/Orc/ObjectLinkingLayer.cpp:212-225
+        auto InternedName = Sym->getName();
         JITSymbolFlags Flags;
 
         if (Sym->isCallable())
           Flags |= JITSymbolFlags::Callable;
         if (Sym->getScope() == Scope::Default)
           Flags |= JITSymbolFlags::Exported;
----------------
Once `Symbol::getName()` returns by const-ref I think you can replace `InternedName` with `Sym->getName()` everywhere here.


================
Comment at: llvm/lib/ExecutionEngine/Orc/ObjectLinkingLayer.cpp:231
       if (Sym->hasName() && Sym->getScope() != Scope::Local) {
-        auto InternedName = ES.intern(Sym->getName());
+        auto InternedName = Sym->getName();
         JITSymbolFlags Flags;
----------------
Ditto here.


================
Comment at: llvm/lib/ExecutionEngine/Orc/ObjectLinkingLayer.cpp:414-420
+        auto Name = Sym->getName();
+        if (!MR->getSymbols().count(Name)) {
           JITSymbolFlags SF = JITSymbolFlags::Weak;
           if (Sym->getScope() == Scope::Default)
             SF |= JITSymbolFlags::Exported;
           NewSymbolsToClaim[Name] = SF;
           NameToSym.push_back(std::make_pair(std::move(Name), Sym));
----------------
I'd drop `Name` and just use `Sym->getName()` here.


================
Comment at: llvm/lib/ExecutionEngine/Orc/ObjectLinkingLayer.cpp:465-469
+      auto SymName = Sym->getName();
       if (!SymDeps.External.empty())
         ExternalNamedSymbolDeps[SymName] = SymDeps.External;
       if (!SymDeps.Internal.empty())
         InternalNamedSymbolDeps[SymName] = SymDeps.Internal;
----------------
Drop `SymName` and just use `Sym->getName()`.


================
Comment at: llvm/unittests/ExecutionEngine/JITLink/LinkGraphTests.cpp:75-89
+TEST(LinkGraphTest, ConstructionWithPool) {
+  auto SSP = std::make_shared<orc::SymbolStringPool>();
+  // Check that LinkGraph construction works as expected.
+  LinkGraph G("foo", SSP, Triple("x86_64-apple-darwin"), 8, support::little,
+              getGenericEdgeKindName);
+  EXPECT_EQ(G.getName(), "foo");
+  EXPECT_EQ(G.getTargetTriple().str(), "x86_64-apple-darwin");
----------------
Once you drop the convenience method this shouldn't be needed.


================
Comment at: llvm/unittests/ExecutionEngine/JITLink/LinkGraphTests.cpp:717
+}
\ No newline at end of file

----------------
Missing newline. :)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132988/new/

https://reviews.llvm.org/D132988



More information about the llvm-commits mailing list