[PATCH] D63932: [GlobalDCE] Dead Virtual Function Elimination

Wed Jul 31 11:01:10 PDT 2019

pcc added a comment.

In D63932#1608235 <https://reviews.llvm.org/D63932#1608235>, @ostannard wrote:

> > - Take the example from my earlier message, give the "main executable" and "DSO" hidden visibility, build the "main executable" with LTO and the "DSO" without LTO, and statically link them both into the same executable. We run into the same problem where the Plugin1 vtable is potentially not referenced and thus misoptimised. Yes, this is a violation of the LTO visibility rules, but the example shows that we only detect it sometimes. I think that if we did want to detect cases where the LTO visibility rules are clearly being violated, the outcome should be to issue a diagnostic and not to silently proceed with optimizations disabled, since the violation might be masking other undetected issues. That really seems orthogonal to this patch, though.
>
> As you say, this is a violation of the LTO visibility rules, so `[[clang::lto_visibility_public]]` must be used. I've gated this optimisation behind `-fwhole-program-vtables`, which if I've understood https://clang.llvm.org/docs/LTOVisibility.html correctly, is effectively the user asserting to the compiler that they've followed the LTO visility rules correctly.

Yes, that's correct.

>> - Imagine linking part of a program with ld -r with LTO -- all symbols including vtable symbols will appear to be externally visible, which is not necessarily a violation of the LTO visibility rules because they may not be used externally in the final link. So we end up pessimising unnecessarily.
> 
> I'm not really familiar with partial linking, is there any good documentation on how it is supposed to interact with LTO?

I'm not aware of any. The way that I think about it is that a partial link operation with LTO inputs can still produce an LTO unit, but that LTO unit may not be combined with other LTO units, either by passing bitcode files directly to the final link or by adding another `ld -r` output with an LTO unit to the final link.

> I've tried to make the changed to vcall_visibility as similar to the normal LTO internalisation process as possible, it happens at the same time and under the same conditions. If partial linking prevents normal internalisation, which will in turn prevent most of the existing optimisations done by GlobalDCE, then surely it should also prevent internalisation of vcall_visibility?

Partial linking will indeed prevent dropping the virtual functions, but it should not prevent clearing the pointer to the virtual function in the vtable. The linker should then be able to drop the virtual function body as part of `--gc-sections` during the final link.

>> I gave this some more thought and it seems that the frontend has all of the information required to make a determination about whether to allow VFE, without needing to teach LTO to relax visibility. We can make the frontend do this:
>> 
>> - If -flto was passed (see CodeGenOptions::PrepareForLTO), attach "allow VFE" metadata if class has hidden LTO visibility.
>> - Otherwise, attach "allow VFE" metadata if class is not isExternallyVisible.
>> 
>>   Now the behaviour of GlobalDCE can be made conditional on the "allow VFE" metadata alone. This also has the advantage of simplifying the IR by making "allow VFE" a boolean rather than a tri-state as it is now.
> 
> The problem with that is that GlobalDCE is still run in the pre-linking pipeline when emiting an LTO object, because it can remove code/data which already has internal linkage.

Right, of course. Sorry, I forgot about that. It seems that if we want to allow VFE on internal linkage types without LTO, we would still need the tri-state.

> To make this work, we'd have to do one of:
> 
> - Tell GlobalDCE whether it is running in LTO post-linking or not, so it can avoid doing VFE before internalisation.   This breaks the idea that the IR contains all the information needed to determine if an optimisation is legal or not.
> - Remove the pre-linking runs of GlobalDCE when doing LTO. This will result in more known-dead code reaching the LTO link stage, and results in a pass which is only valid at certain points in the pipeline.
> - Split VFE out of GlobalDCE. This would result in worse optimisation, because chains of dependencies involving both virtual calls and regular references won't be followed fully.

I think I would favour something closer to your first suggestion, but instead of telling GlobalDCE specifically about this, we represent the "post-link" flag in the IR (e.g. as a module flag) in order to keep the IR self-contained. LTO would then be taught to add this flag at the right time, and the logic inside GlobalDCE would be:

- If post-link flag not set, allow VFE if linkage <= TranslationUnit.
- If post-link flag set, allow VFE if linkage <= LinkageUnit.

This would also help address a correctness issue with the CFI and WPD passes, which is that it is currently unsound to run them at compile time. If we let them run at compile time, we would in principle be able to do CFI and WPD on internal linkage types without LTO.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63932/new/

https://reviews.llvm.org/D63932