[PATCH] D41390: Wasm LLD: Don't write out discarded function symbols

Nicholas Wilson via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Dec 19 14:07:57 PST 2017


ncw added inline comments.


================
Comment at: wasm/InputFiles.cpp:232
+      if (S->getFile() == this)
+        ReachedFunctions[WasmSym.ElementIndex - NumFunctionImports] = true;
       break;
----------------
sbc100 wrote:
> How about just:
> 
> ```if (S->getFile() != this)  {
>     // This symbol was already defined in another file.
>     Functions[WasmSym.ElementIndex - NumFunctionImports] = &InputFunction::Discarded;
> }```
> 
> And can we do the same for globals?
Sadly that won't work, the extra layer of redirection with ReachedFunctions is definitely necessary. The way you wrote it was my first attempt!

Here's the problematic C code:

```
void defaultImpl(void) { ... }
void aliasedCall(void) __attribute__((weak,alias("defaultImpl")));

static void consumingFunction() {
  aliasedCall();
  defaultImpl();
}
```

Here, you can see the symbol "aliasedCall" is a function export - so the function body for "defaultImpl" is exported with two different names in the exports section, "defaultImpl" and "aliasedCall".

Now imagine that another translation unit defines the symbol "aliasedCall" (either strongly or just gets in first). In this case, we'll get `S->getFile() != this` for the Function when we process the "aliasedCall" export. Yet, the Function is reachable through two different Symbols, and "defaultCall()" should still reach the function in this translation unit.

Short answer: if //one// Symbol points to another file, that's not enough reason to prune that Symbol's InputFunction. We need //every// Symbol that points to the Function to be redirected in order to safely prune it.

Luckily we have a test for it already. One of the aliased symbol tests caught this bug.

---

Secondly, globals. Hmm, it's trickier. In general, no, since an InputSegment can contain multiple globals? We could put in some code to count how many globals refer to each InputSegment, and if it's a segment with a solitary global we could prune it... Would be interesting to see how much the filesize goes down.

My expectation is that almost all the "weak" symbols that are used in the wild are C++ inlines. Even low-level systems code typically only uses weak symbols in a small number of files.

Thus, optimising the case of weak symbols is not a priority. Comdat handling will prune the duplicate InputSegments that actually occur in the wild, and we don't really need to handle the non-Comdat case.


================
Comment at: wasm/InputFunction.cpp:18
+
+InputFunction InputFunction::Discarded(nullptr, nullptr, nullptr);
----------------
sbc100 wrote:
> Seems a shame to add a new source file just for this.
> 
> I've not looked at how the other linker do this, but could we not just have a "Discarded" flag on these objects instead of the these magic statics?
The file is doing no harm. We might need the file anyway eventually. Let's not let the length of the file affect any other decisions.

Good question about Discarded, I was initially surprised too.

The "Discarded" implementation I'm using comes directly from the ELF linker, to keep Rui happy. It's actually quite sensible: if you have a flag, there are two problems. a) you have to check the flag everywhere, and b) you have to keep on checking it forever whenever someone adds new code, ie it's a pain for ever.

Redirecting to a dummy Function like "Discarded" is much safer. Since all the fields in Discarded are null, there's no chance at all that you'll use it by mistake. You only need to check against Discarded in a couple of places, everywhere else can reference the InputFuction/InputSegment without worrying - you'll pass on its nullness if you reference the members.

Keeping it similar to ELF-LLD is probably a good enough reason to use the idiom for now, in any case.


Repository:
  rLLD LLVM Linker

https://reviews.llvm.org/D41390





More information about the llvm-commits mailing list