[PATCH] D63932: [GlobalDCE] Dead Virtual Function Elimination

Wed Jul 31 05:49:34 PDT 2019

ostannard added a comment.

> - Take the example from my earlier message, give the "main executable" and "DSO" hidden visibility, build the "main executable" with LTO and the "DSO" without LTO, and statically link them both into the same executable. We run into the same problem where the Plugin1 vtable is potentially not referenced and thus misoptimised. Yes, this is a violation of the LTO visibility rules, but the example shows that we only detect it sometimes. I think that if we did want to detect cases where the LTO visibility rules are clearly being violated, the outcome should be to issue a diagnostic and not to silently proceed with optimizations disabled, since the violation might be masking other undetected issues. That really seems orthogonal to this patch, though.

As you say, this is a violation of the LTO visibility rules, so `[[clang::lto_visibility_public]]` must be used. I've gated this optimisation behind `-fwhole-program-vtables`, which if I've understood https://clang.llvm.org/docs/LTOVisibility.html correctly, is effectively the user asserting to the compiler that they've followed the LTO visility rules correctly.

> - Imagine linking part of a program with ld -r with LTO -- all symbols including vtable symbols will appear to be externally visible, which is not necessarily a violation of the LTO visibility rules because they may not be used externally in the final link. So we end up pessimising unnecessarily.

I'm not really familiar with partial linking, is there any good documentation on how it is supposed to interact with LTO?

I've tried to make the changed to vcall_visibility as similar to the normal LTO internalisation process as possible, it happens at the same time and under the same conditions. If partial linking prevents normal internalisation, which will in turn prevent most of the existing optimisations done by GlobalDCE, then surely it should also prevent internalisation of vcall_visibility?

> I gave this some more thought and it seems that the frontend has all of the information required to make a determination about whether to allow VFE, without needing to teach LTO to relax visibility. We can make the frontend do this:
> 
> - If -flto was passed (see CodeGenOptions::PrepareForLTO), attach "allow VFE" metadata if class has hidden LTO visibility.
> - Otherwise, attach "allow VFE" metadata if class is not isExternallyVisible.
> 
>   Now the behaviour of GlobalDCE can be made conditional on the "allow VFE" metadata alone. This also has the advantage of simplifying the IR by making "allow VFE" a boolean rather than a tri-state as it is now.

The problem with that is that GlobalDCE is still run in the pre-linking pipeline when emiting an LTO object, because it can remove code/data which already has internal linkage. To make this work, we'd have to do one of:

- Tell GlobalDCE whether it is running in LTO post-linking or not, so it can avoid doing VFE before internalisation.   This breaks the idea that the IR contains all the information needed to determine if an optimisation is legal or not.
- Remove the pre-linking runs of GlobalDCE when doing LTO. This will result in more known-dead code reaching the LTO link stage, and results in a pass which is only valid at certain points in the pipeline.
- Split VFE out of GlobalDCE. This would result in worse optimisation, because chains of dependencies involving both virtual calls and regular references won't be followed fully.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63932/new/

https://reviews.llvm.org/D63932