[PATCH] D36850: [ThinLTO] Add norecurse function attribute propagation

Thu Aug 26 21:31:38 PDT 2021

modimo marked 3 inline comments as done.
modimo added a comment.

In D36850#2964293 <https://reviews.llvm.org/D36850#2964293>, @tejohnson wrote:

> Good point on indirect calls. Rather than add a bit to the summary, can the flags just be set conservatively in any function containing an indirect call when we build the summaries initially? I think that would get the same effect.

That could have an issue where A calls {indirect, B} and A gets propagated upon from B without knowledge that the indirect call exists. Right now I've got a FunFlags `hasUnknownCall` which marks these as non-propagatable.

> For speculative devirtualization aka ICP, we will still be left with a fallback indirect call, so it would still need to be treated conservatively. The extra edges added for ICP promotion candidates shouldn't be a problem or affect this.

Ah good point. I was thinking it may pessimize the propagation because of having to process all of these edges this is a no-go because of the fallback.

> Note that with class hierarchy analysis we can do better for virtual calls, e.g. when -fwhole-program-vtables is enabled for whole program devirtualization and we have the necessary whole program visibility on vtables. We could potentially integrate WPD decision here. Even if we can't find a single devirtualization target, we can compute the set of all possible targets of virtual calls during the thin link and could use that information to basically merge the attributes from all possible targets. But probably best to leave that as a future refinement as it will take some additional work to get that hooked up. We'd still need to be conservative for virtual calls when we don't have the necessary type info (when -fwhole-program-vtables not enabled or we don't have whole program visibility on the vtable defs), or for non-virtual indirect calls.

Agreed, it's an engineering problem more than anything. I ran an optimistic build (same revisions as before, release + noinline) where indirect and virtual calls were assumed to always propagate (thinlto_prop_optimistic) and the effect in Clang self-build at least is not too large:

thinlto_base/

  "dwarfehprepare.NumCleanupLandingPadsRemaining": 217515,
  "dwarfehprepare.NumNoUnwind": 299126,
  "dwarfehprepare.NumUnwind": 332785,

thinlto_prop/

  "dwarfehprepare.NumCleanupLandingPadsRemaining": 158372,
  "dwarfehprepare.NumNoUnwind": 420918,
  "dwarfehprepare.NumUnwind": 210870,

thinlto_prop_optimistic/

  "dwarfehprepare.NumCleanupLandingPadsRemaining": 154958,
  "dwarfehprepare.NumNoUnwind": 425893,
  "dwarfehprepare.NumUnwind": 205889,

(425893-420918)/(420918-299126) = 4% boost over being conservative and correct. This may change in real workloads though so I added a `thinlto-funcattrs-optimistic-indirect` flag for easy measurement.

================
Comment at: llvm/lib/Transforms/IPO/FunctionAttrs.cpp:383
+  //   occur in a couple of circumstances:
+  //      a. An internal function gets imported due to its caller getting
+  //      imported, it becomes AvailableExternally
----------------
tejohnson wrote:
> I'm not sure how this case could be happening as we haven't actually done the importing that would create these new available externally copies yet - that happens in the LTO backends, during the thin link we just add them to import lists.
I added the test funcattrs-prop-exported-internal.ll that demonstrates this. The summary has its internal linkage converted to external in [thinLTOResolvePrevailingInIndex](https://github.com/llvm/llvm-project/blob/92ce6db/llvm/lib/LTO/LTO.cpp#L436) then converted to AvailableExternally in [thinLTOResolvePrevailingGUID](https://github.com/llvm/llvm-project/blob/92ce6db/llvm/lib/LTO/LTO.cpp#L370). Currently being handled conservatively since a prevailing copy doesn't exist.

================
Comment at: llvm/lib/Transforms/IPO/FunctionAttrs.cpp:388
+  //         propagating on its caller.
+  //      b. C++11 [temp.explicit]p10 can generate AvailableExternally
+  //      definitions of explicitly instanced template declarations
----------------
tejohnson wrote:
> There is no prevailing copy presumably because the prevailing copy is in a native library being linked? I think these cases can be handled conservatively.
Yeah the prevailing copy is in the native binary.

This is a [C++ specific feature](https://github.com/llvm/llvm-project/blob/92ce6db/clang/lib/AST/ASTContext.cpp#L10755) which has ODR and these are already being propagated/inlined from in pre-link. The current thinlink propagation implementation is conservative because a prevailing copy doesn't exist. Currently being handled conservatively since a prevailing copy doesn't exist.

================
Comment at: llvm/lib/Transforms/IPO/FunctionAttrs.cpp:419
+    // Virtual calls are unknown so go conservative
+    if (!FS || FS->getTypeIdInfo())
+      return CachedAttributes[VI];
----------------
tejohnson wrote:
> I think this checking for virtual calls will only work if -fwhole-program-vtables is enabled for whole program devirtualization or CFI. Otherwise we don't have the type tests that cause this to get populated. This also won't detect non-virtual indirect calls.
I see. I added `hasUnknownCall` as an explicit flag for all indirect calls that should capture both cases.

================
Comment at: llvm/lib/Transforms/IPO/FunctionAttrs.cpp:443
+          // we'll keep the last one seen.
+          ODR = FS;
+        } else if (!Prevailing && !ODR) {
----------------
tejohnson wrote:
> modimo wrote:
> > tejohnson wrote:
> > > Won't we still have a copy marked prevailing? Wondering if the weak linkage cases can all be merged.
> > Yeah, there will still be a copy that's prevailing. Reading through the linkage descriptions again and also those in `FunctionImportGlobalProcessing::getLinkage`:
> > 
> > 1. I think with External/WeakODR/LinkOnceODR once the prevailing is found use that copy
> > 
> > 2. For Weak/LinkOnce even with a prevailing copy I don't know if the copy ultimately used will be prevailing. I'm wondering if a native definition could be the victor in which case we just can't propagate off these functions. 
> > 
> > WDYT about (2)? For C++ at least these don't seem to really exist and testing with Clang self-build I'm not seeing this kick in anywhere. I added a flag to specifically disable this case so it's easy to test out the differences.
> Since the linker which invokes this should have been passed all objects to link, bitcode and native, it can do symbol resolution across all of them. So if there is an overriding native strong symbol, it should see that and the bitcode resolution would be non-prevailing and all bitcode copies marked dead (in computeDeadSymbols). So I think the weak any and linkonce any case can take the prevailing copy.
That makes it much easier and everything folds into the prevailing case! Changed and added a test for it.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D36850/new/

https://reviews.llvm.org/D36850