[cfe-dev] RFC: Devirtualization v2

Fri Mar 30 09:39:01 PDT 2018

> On Mar 30, 2018, at 10:36 AM, Piotr Padlewski <piotr.padlewski at gmail.com> wrote:
> 2018-03-29 18:01 GMT+02:00 John McCall <rjmccall at apple.com <mailto:rjmccall at apple.com>>:
>> On Mar 29, 2018, at 9:12 AM, Piotr Padlewski <piotr.padlewski at gmail.com <mailto:piotr.padlewski at gmail.com>> wrote:
>> 2018-03-28 23:23 GMT+02:00 John McCall <rjmccall at apple.com <mailto:rjmccall at apple.com>>:
>>> On Mar 19, 2018, at 7:27 PM, Piotr Padlewski via cfe-dev <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>>> Note that adding calls to strip and launder related to pointer comparisons and integer<->pointer conversions will not cause any semantic information to be lost: if any piece of information could be inferred by the optimiser about some collection of variables (e.g. that two pointers are equal) can be inferred now about their stripped versions, no matter how many strip and launder calls have been made to obtain them in the IR. As an example, the C++ expression ptr == std::launder(ptr) will be optimised to true, because it will compare strip(ptr) with strip(launder(ptr)), which are indeed equal according to our rules.
>> 
>> This proposal sounds great, even if it still doesn't solve some of the problems I personally need to solve with invariant loads. :)
>> 
>> I take it that the actual devirtualization here is ultimately still done by forwarding a visible store of the v-table pointer to an invariant load, just by noticing that they occur to the same laundered pointer and therefore must involve the same value.  There's no way of saying "I know what the value of the v-table pointer is even if you can't see a store" when creating a laundered pointer.  For example, in Swift we have constructor functions that are known to return a complete object of a specific type, even if we can't necessarily see the implementation of that function; there's no way for us to say anything about that function pointer
>> 
>> I think we have already solved that problem with calls to llvm.assume intrinsic. After calling the constructor, we load virtual pointer (with invariant group) and compare it with the vtable it should point to and then pass it to the assume.
>> 
>>   call void @_ZN1AC1Ev(%struct.A* %a) ; call ctor
>>   %3 = load {...} %a                  ; Load vptr
>>   %4 = icmp eq %3, @_ZTV1A      ; compare vptr with vtable
>>   call void @llvm.assume(i1 %4)
>>  
>> (from http://blog.llvm.org/2017/03/devirtualization-in-llvm-and-clang.html <http://blog.llvm.org/2017/03/devirtualization-in-llvm-and-clang.html> )
>> 
>> If I understand it correctly, you should be able to use the same technique for the constructor-like functions in Swift :)
> 
> Yes, I think so.  Although IIRC people have had significant trouble with llvm.assume — the work that's just done for assume purposes has a nasty habit of sticking around.
> I had a problem with assume couple of years ago, but I think it looks much better right now. We will how it works right now.
>  
> 
>>> LLVM
>>> 
>>> 
>>> Because LTO between a module with and without devirtualization will be invalid, we will need to break LLVM level ABI. This is however already implemented, because LTO between modules with invariant.group.barriers and without is also invalid. This also means that if we don’t want to break ABI between modules with and without optimizations, we will need to have invariant.barriers and fatpointer.create/strip turned on all the time.  For the users it will means that when switching to new compiler, they will have to recompile all of the generated object files for LTO builds.
>> 
>> Is there really no way to have this degrade more gracefully?  I continue to be very concerned about frontend interworking here, either between different versions of a single frontend (e.g. clang 6 vs. clang 8), or between different invocations of a single frontend with different language options set (e.g. clang vs. clang++), or even between different frontends that produce IR that gets linked together (e.g. clang vs. swift).
>> 
>> How about this approach:
>>   - Instead of taking a meaningless !{} argument, invariant.group takes a string argument which identifies a metadata-dependent optimization.  In your case, it would be something like !"clang.cxx_devirtualization".
>>   - Functions have a "supported optimizations" list which declares all the metadata-reliant optimizations they promise to have correct metadata for.  So e.g. clang++ would list "clang.cxx_devirtualization" on every single function it compiled, regardless of whether that function actually needed any metadata.  I'm pretty sure metadata are optimized so that identical lists of options like this don't take up more space just because they're added to every single function in the module.
>>  - Interprocedural optimizations — which mostly means inlining — are required to be aware of the supported-optimizations list.  The inliner would intersect the supported-optimizations lists and then strip metadata/intrinsics that don't belong anymore.
>> 
>> But the idea that every single metadata-dependent optimization is going to create a new "IR ABI break" just seems unacceptable to me.  Compiler optimization IRs are not stable things; compiler engineers constantly find new things that they want to express.
>> 
>> John.
>> 
>> I haven't thought about LTO between different languages, thanks for bringing that! 
>> Can you actually use C++ objects without going through C interface? If it is possible, then that is heavy.
> 
> Not yet, but it's a goal.  But even without that, Swift might call a C interface and the code on the other side of the C interface might be C++.
> 
> Even putting Swift aside, it's not atypical to have a few C files in a majority-C++ project, or vice-versa.  Or, for that matter, a few files that are compiled with different optimization settings.
> 
>> To clarify how it works right now - if you would do LTO between IR compiled with -fstrict-vtable-pointers and without, then the linker would throw an error. I can see it right now, that it pretty much stops you from doing any LTO between different languages.
> 
> Yeah.  It also creates problems for people who are trying to make LTO-able static libraries; Apple encourages people to use bitcode for some things, and we'd like to do more of that.
> 
>> The other idea that we had, was to actually strip all the invariant.groups when linking with module that does not have them. This, opposed to the first idea would let us link whatever we want, but we could silently loose some optimizations.
>> 
>> I like the idea that you proposed - it is somewhere between these two ideas, as you limit the potential loses to only some functions and in the same time you can link whaterver IR you like. 
> 
> Yeah, just losing the optimization in functions where you've actually merged different information is a really nice property.
> 
>> However, if you agree that the option 2 - stripping invariant.groups from whole modules - addresses all of your concerns, then I would propose to firstly go with this idea and then optimize it if we would find a problem with it.
>> I feel that it might be an overkill to implement it on the first go, especially that we are not even in the point of thinking about turing -fstrict-vtable-pointers on by default.
>> 
>> What do you think about that?
> 
> I certainly think it's fine for your summer project to just get the optimization working first.
> 
> When it comes time to actually harden the IR linker against this, I think we should go ahead and pursue the more aggressive function-by-function solution.  That's not because the whole-module solution wouldn't solve the problem — you're absolutely right, it would.  But it seems to me that (1) the function-by-function solution is where we ought to end up, and (2) it's not that much more work than the whole-module solution, because the big piece of work in either case is finding and stripping the right metadata and intrinsics, and (3) crucially, it's not an extension of the whole-module solution — it relies on information being provided in a completely different way.  If we implement the whole-module approach, it becomes a legacy part of the system that we're stuck with *in addition to* whatever function-by-function approach we eventually settle on, and it probably permanently complicates the function-by-function approach.
> 
> John.
> 
> That's a good point, let's bring that problem back when the project progresses. 
> Do you know any other specific situations and metadata that would require, or would be good if would use the same solution?

Ah, sure.  It's probably the right solution for TBAA metadata compatibility as well.  Different compilers are likely to use different TBAA tag hierarchies, just because they have different rules for aliasing.  In the absence of some way of officially declaring them compatible, we should assume they're incompatible and strip them during inlining.  In fact, it's probably true that *most* annotation approaches are frontend-specific and should be stripped when merging information from different frontends.

John.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180330/13f14de2/attachment.html>